Navigating the Terrain: GPT's Journey into Malware Analysis
- We delve into the inherent strengths and common
challenges that GPT (OpenAI’s GPT-4 henceforth “GPT”) encounters when engaged
in the realm of malware analysis, providing tangible examples for clarity.
- Examining the root cause and structure of the
'ceiling' obstructing GPT's effectiveness in various malware analyst tasks, we
aim to uncover insights and potential solutions.
- Proposing inventive hacks and mitigations, our goal
is to surmount the limitations posed by the aforementioned 'ceiling' and
enhance GPT's reasoning capabilities in the field of malware analysis.
- Demonstrating a proof of concept we showcase how
GPT's proficiency in guiding analysts through triage on binary samples can be
significantly improved.
Introduction
In the realm of technology, GPT has emerged as a transformative force, often regarded as a miraculous tool in the current tech cycle. Skeptics question its true intelligence, drawing parallels to trendy buzzwords like NFTs and blockchains. However, those who have delved into the capabilities of GPT understand its profound impact, turning week-long projects into a matter of hours. Amidst the hype and skepticism lies a nuanced reality—one that we explored through practical tests, challenging GPT with the intricate task of malware analysis.
GPT's Natural Strengths
GPT's prowess lies in its verbal
acuity, a verbal thinker that excels in deciding the most fitting words and
their placement. This linguistic strength grants GPT access to an extensive
cheat sheet of human knowledge. If a pertinent answer exists in its training
data, GPT can reproduce it with uncanny accuracy. For instance, when presented
with a GandCrab report(GradCrab
ransomware works to encrypt a victim's files and demands ransom payment
in order to regain access to their data. GandCrab targets consumers and businesses with PCs
running Microsoft Windows), GPT effortlessly recalled information and
even demonstrated the ability to perform a Google Scholar search
Sentence Completion
GPT is a totally verbal thinker. Its entire power is predicated on an outstanding capability to decide what’s the most appropriate word to put, and where, in its response. This is one of the most important things to understand about GPT — a lot of the behavior that we will cover later is, in a sense, downstream from this one property.One of the immediate implications of this is that GPT has access to a huge latent cheat sheet. If someone, at any point in history, has answered the actual question being asked and this answer has made it into GPT’s training data, GPT exhibits an uncanny ability to reproduce the answer.
Big Picture Summaries
Through its web of word
associations, GPT has a keen grasp of grammar and of the difference between key
vs. ancillary facts. As a result, one of the tasks where GPT can be
trusted to perform most reliably is producing “a summary of the big picture”
when given input which is too large for comfort for human consumption. For
example, when given part of a very lengthy API call log produced by a piece of
malware and asked to summarize the log, GPT produced the below useful output:
???? The malware seems to be heavily interacting with Windows API and doing various operations such as file operations, memory management, privilege escalation, loading libraries, and notably cryptography-related operations.
Gap Between Knowledge and Action
However, a notable challenge emerges—a gap between knowledge and action, reminiscent of Richard Feynman's critique of students who memorized without understanding. This echoes in GPT's encounters with malware analysis tasks, where its struggles seem rooted in comprehending the essence of information. The dichotomy between having access to information and comprehending its meaning poses challenges in tasks requiring a deeper understanding of context.
Typical Challenges in Malware Analysis
The complexity of malware
analysis unveils challenges for GPT. When tasked with triage—identifying benign
or malicious binaries—GPT exhibited limitations.
As it turned out, seeing GPT deal
with this apparently simple task already produced a wealth of insight regarding
its ability to reason in this domain.
While GPT is an artificial
construct, many of the challenges one encounters when applying GPT to the
domain of malware analysis seem strangely human on GPT’s part. We collected
many examples of GPT running into these challenges while attempting to deal with
some task, and tried as much as possible to sort them into larger, more general
categories. The result was the below list of 6 general principal obstacles:
•
Memory Window Drift
•
Gap between Knowledge and Action
•
Logical Reasoning Ceiling
•
Detachment from Expertise
•
Goal Orientation
•
Spatial Blindness
Overcoming Challenges
Acknowledging these challenges,
we sought hacks and mitigations to elevate GPT's capabilities in malware
analysis. A proof of concept, involving a heavily engineered prompt, showcased
improvements in GPT's ability to guide an analyst during triage.
You can view a demo of how GPT
fares in the triage tasks with all the various above-described mitigations
introduced:
•
Full transcript of GPT (with engineered prompt)
navigating the triage task — GandCrab
• Full transcript of GPT (with engineered prompt) navigating the triage task — ApplePush
Conclusion
GPT's journey into malware
analysis illuminates its natural strengths and the challenges it faces in
reasoning within this domain. While its verbal acuity and vast knowledge
repository are undeniable assets, the gap between knowledge and action presents
hurdles. The exploration of hacks and mitigations signifies ongoing efforts to
enhance GPT's applicability in complex tasks, opening avenues for future
advancements in the synergy between artificial intelligence and cybersecurity,
Read the detailed report on this
analysis on our Check Point Research page.