|
How Recursive Malware Scanning Navigates Compressed Archives |
6/6/2025 - Brian O'Neill |
Addressing the Concept of Threat ObfuscationIn cybersecurity, threat obfuscation is an expansive game bad actors play to throw off antivirus (AV) software scanning policies. The idea of threat obfuscation is rational: to successfully smuggle malicious files past robust network defenses, one must successfully convince the “sentries” of those defenses (the AV policies) to look no further than the outer shell of the vehicle the files hide within. Archive file types like ZIP, RAR, 7Z and others capable of compressing (and sometimes encrypting) immense volumes of content are ideally suited for the purposes of malware obfuscation. Recursion – a complex but extremely important concept in computing – is the methodology best suited to counteract archive-based malware obfuscation strategies. Invoking a Familiar Metaphor for Malware ObfuscationIf the concept of layering malware within an innocuous container conjures an image of Odysseus’ Trojan Horse to mind, we’re thinking along the right lines. Malicious code snuck into a system hiding amongst legitimate software components has been referred to officially as a “Trojan” since the 1970’s in cybersecurity, paying tribute to the storied (albeit likely mythological) sneak-attack which allegedly spelled doom for the ancient city of Troy. While archive-based attacks aren’t necessarily Trojans by the strictest security definitions, they’re theoretically similar enough to invoke the same metaphor. This metaphor has clear shortcomings, however, when we face the reality of compressed archive-based threat obfuscation – and it’s worth nitpicking these shortcomings to better appreciate the role recursion plays in detecting threats nested deeply within hierarchical structures. Dispelling the Illusion of a 2-Dimensional Attack SurfaceThe Trojan Horse of legend was a large, mobile statue with two basic layers. The external layer was a wooden structure shaped to look like a horse, and the internal layer was an open cavity just large enough for Odysseus and his comrades to all fit within. The obfuscation of the Athenian attack group only went one layer deep – and it’s safe to say that, in hindsight, the Trojans might’ve benefitted from briefly investigating that interior layer before celebrating their believed victory. Against threats obfuscated in compressed archives, a second look wouldn’t be enough. This idea is easiest to understand if we briefly suspend disbelief (even further) by re-envisioning the Trojan horse metaphor. A Horse, Within a Horse, Within a Horse…Let’s imagine that a Trojan guard was proactive and cunning enough to check inside the suspicious horse structure before allowing it to enter the city of Troy. Imagine that within the structure, rather than uncovering a group of Athenian soldiers, this guard instead found boxes of gifts and supplies - along with another, smaller Trojan Horse built to scale. Next, imagine that upon opening the second horse, the guard found yet another assortment of gifts, and yet another scale model of the original horse. Still, no hidden soldiers to be found. Puzzled as we might expect this guard to be, we’d likely assume they were justified in declaring the horse safe to enter the city walls after their search. We’d also likely empathize with this guard’s feeling of complete bewilderment when, later that night, Odysseus and his troops still emerged from the horse and caught the sleeping city by surprise. They were, by powers unknown to man, hidden within the nth iteration of the scale horse. Perhaps the guard would’ve found them if they’d looked a few more layers deep; perhaps not. It’s impossible to say up front how deep that nested structure would’ve gone. Addressing the Reality of Nested Threats in Compressed ArchivesThis absurd multi-layered attack vehicle concept more accurately represents the level of obfuscation compressed archive formats can provide. It’s closer to a Russian nesting doll concept than a Trojan horse. Formats like ZIP can hold countless layers of files – including additional ZIP archives – because those archives are treated just like any additional files by the parent ZIP they live within. It’s not enough to look past one, two, or even three layers of a compressed archive to mitigate an obfuscated threat; it’s essential to look at each file within each archive layer before declaring the full archive “safe”. Recursion is the key concept which makes deep-archive spelunking possible without knowing exactly how deep the archive goes. Understanding Recursion and its Utility in Security WorkflowsA Brief Overview of Recursion in ComputingRecursion is a powerful concept in mathematics and computing. In computing, it specifically refers to a function or method calling itself to solve smaller pieces of a larger problem. A method’s ability to accomplish a recursive task depends on the existence of a base case. This base case gives the method call solid ground to work from, preventing it from endlessly looping through its own logic. Examples of recursive problem-solving range greatly in complexity, including anything from finding n in a factorial sequence to solving for n attempts in the famous Tower of Hanoi problem. Below is a simple example of a recursive method that returns the factorial value for an integer n in Java:
As shown in the above code, the method Once It's important to note that recursion, while powerful, is also extremely resource-intensive compared to other methods of looping through code. RAM consumption in each recursive method call adds up quickly, and this can quickly overwhelm a device in large-scale recursive cases if memory consumption is not handled carefully. Recursion in File Directory TraversalFile directory traversal is a natural fit for recursive computing. File directories are hierarchically structured, and they’re full of recursive base cases – whether we’re starting from a root folder or 20 layers deep. When we search for content in our Windows file explorer, for example, we invoke a recursive method under the hood on our device. If, starting from our root folder, we search for files with the phrase “report”, our device will slowly but surely work through each successive folder in our system – checking each file within each folder – for content with “report” in the title or body. It’ll do this until it reaches the very last set of folders in the file directory hierarchy, and from there, it’ll work its way back up to the root folder. This is ultimately very similar to our earlier factorial example. Instead of successively multiplying new values, our Operating System (OS) will employ its own logic in each recursive call to queue and display a series of files which contain the phrase that matches our search string. Recursion in Compressed Archive Threat ScanningCompressed archives like ZIP, TAR, 7Z, and others are little more than portable file directories with unique compression algorithms. This makes them powerful, ubiquitous tools for sharing a multitude of large files and/or folders at once, and it’s also what makes them naturally suited for recursive threat scanning techniques. No matter how deep a threat actor chooses to bury malware within the hierarchical structure of a compressed archive file, AV software equipped with recursive archive-scanning methods will eventually retrieve the file in question. Whether or not the threat is identified depends, at that point, on the threat detection policies themselves. After being recursively identified, files (e.g., Recursive Scanning with CloudmersiveCloudmersive’s Advanced Virus Scan API utilizes recursion as a core mechanic in its deep content verification process. Archives with unsafe extraction outcomes – such as ZIP or RAR “bombs” packed with immense volumes of data intended to crash a vulnerable system – are identified and distinguished from other archive threats. These archives are rendered incapable of harming the target system, as they are detected and processed on dedicated API infrastructure. Nested archives are identified with content verification capabilities which look past the given file extension. This roots out extension-based obfuscation (e.g., disguising a The Advanced Scan API can be deployed in defense of individual web applications (with minor code changes), and it can be deployed with zero code changes at the network perimeter in a forward proxy, reverse proxy, or fully-fledged Web Application Firewall (WAF). It can also be deployed adjacent to AWS, Azure, GCP, and other cloud object storage instances to perform in-storage scanning after files are uploaded. ConclusionRecursive techniques aren’t just an intriguing concept – they’re a practical necessity in modern threat detection. As efforts to obfuscate threats within compressed archive formats grow more sophisticated, simply inspecting the first few layers of an archive becomes less and less sufficient. Just like basic recursive methods which solve for n, advanced antivirus systems must recursively solve for each potential threat in an archive until no new layers remain. Without recursive capabilities, the guard at the gate might miss the real threat hidden several horses deep. To learn more about Cloudmersive’s Advanced Virus Scan API recursive scanning capabilities, please feel free to contact a member of our team. |