How Malicious Scripts are Embedded in Common Documents

Technical Articles

Review Cloudmersive's technical library.

5/15/2025 - Brian O'Neill

Modern document formats like DOCX, XLSM, PDF, and even SVG aren't just containers for text or images. They're fully capable platforms that support complex scripting, automation, external data fetching, and embedded objects. This flexibility establishes their value to legitimate audiences - but it also makes them attractive threat vectors. Sophisticated attackers frequently exploit these document capabilities to obfuscate code deep within the file structure.

In this article, we’ll review how threat actors typically embed malicious scripts in common document formats. We'll break down script exploits involving macros, PowerShell payloads, JavaScript, and malicious links, and we'll cover how they're weaponized to gain system access, execute remote code, or exfiltrate sensitive data in our environment. Additionally, we'll look at some real-world attacks and CVEs to put these threats in context.

At the end of this article, we’ll learn how the Cloudmersive Advanced Virus Scan API detects malicious scripts within files - even when they're carefully obfuscated.

Understanding Common Script-Based Attacks

Attackers often rely on user interaction with a specially crafted document to execute a malicious script. The more privileges the user has in their environment, the more damaging the outcome can be.

However, users don't always need to interact with specially crafted files to initiate attacks. Attackers can also craft documents to exploit problems in the memory management routines of backend file parsers, gaining code execution with equally devastating results.

No matter which strategy attackers use in the initial access phase of their attack, the end goals typically remain the same: to download malware in the target environment and/or exfiltrate data to a remote server.

In the next section, we'll cover some of the most common types of embedded scripts we can expect to encounter. These include:

VBScript / VBA Macros in Office Files
PowerShell Hidden in Embedded Objects
Embedded JavaScript in PDFs
XML External Entity (XXE) Attacks in XML-Based Documents
Legacy Excel 4.0 (XLM) Macros
Embedded JavaScript in SVGs

VBScript / VBA Macros in Office Files

Macros are still among the most abused features in Office documents. Attackers can write malicious Visual Basic for Applications (VBA) code which activates automatically when their delivery document is opened.

One common VBA macro tactic involves leveraging the AutoOpen() or Document_Open() functions to execute PowerShell commands. These commands perform the malware download directly after file opening.

High profile attack campaigns delivering the infamous Emotet and Trickbot Trojans have utilized MS Office documents with malicious embedded macros to initiate infections. Once installed, these Trojans performed complex actions to weaken internal system defenses and hijack key system processes. Preventing the execution stage of these attacks was nearly impossible; proper threat mitigation relied on detecting malicious behavior in the delivery vehicle.

PowerShell in Embedded Objects

Open Office XML files can contain embedded objects which silently run PowerShell commands.

Similar to macro-enabled Office document attacks, embedded object-based attacks are concealed within seemingly harmless files—such as Word or Excel documents with familiar names (e.g., invoice.xlsx, contract.docx)—to bypass security policies and trick users into opening them. Once executed, these malicious objects often launch PowerShell scripts with elevated permissions.

Spear-phishing campaigns carried out by FIN7Z and Lazarus Group have utilized this method. The objects embedded within their delivery documents contained base64-encoded payloads which executed upon user interaction – or even automatically in certain cases.

By definition, spear-phishing campaigns are highly targeted against specific organizations - and against specific users (often those with escalated permissions) within those organizations. This aggressive targeting significantly improves the attacker's chances of success.

Embedded JavaScript in PDFs

PDF files are powerful documents capable of running JavaScript code to automate dynamic actions within interactive forms.

This functionality is frequently abused to deliver malware. Often, attackers use specially crafted PDFs with embedded scripts to target vulnerable PDF readers.

There have been numerous examples of critical vulnerabilities involving PDFs with embedded JavaScript over the last few years.
CVE-2023-26369 for example, was an out-of-bounds write vulnerability found within certain versions of Adobe Acrobat Reader. In this case, specially crafted PDF documents containing embedded JavaScript code could exploit problems with Acrobat Reader's memory handling. After the application mishandled the file, the attacker could execute code within the context of the current user.

CVE-2023-5474 was a heap buffer overflow vulnerability found in Google Chrome’s PDF component. After convincing a user to interact with a specially crafted malicious PDF file, a remote attacker could execute arbitrary code in the user's web browser.

XML External Entity (XXE) Attacks

XXE attacks exploit XML parsers to incorporate unsafe external information. These attacks have the potential to expose internal files or environments. They can be found in XML files or XML documents like DOCX and SVG, and they can be used to target web services or automated file processing systems.

Attackers often create a malicious Document Type Definition (DTD) that tricks XML parsers into including sensitive files—like /etc/passwd—and sending them out via HTTP or DNS. These attacks commonly target document converters or any API that processes XML without proper validation.

Excel 4.0 (XLM) Macros

Excel 4.0 macros are an outdated but still-supported feature. Attackers often store them in hidden sheets, which effectively avoid modern macro detection tools. They use formulas like =EXEC(“cmd.exe/c…”) to launch payloads.

The BazarLoader and QBot campaigns have been known to exploit this tactic, hiding macros in XLS files or even misusing XLSX files with hidden content.

SVG with Embedded JavaScript or Malware Links

SVG is a much more complex file type than most other image formats. Like PDFs, SVG files can include malicious JavaScript – and that makes them a favorite for phishing and redirect chains. Attackers can embed triggers like onload, onclick, or setTimeout within SVG files to initiate payloads.

Unlike PDFs, SVG files can be placed directly in emails or HTML. This means their scripts can run with one click or load quietly, sending victims to harmful campaigns.

Identifying Embedded Scripts with Cloudmersive

The Cloudmersive Advanced Virus Scan API looks beyond file extensions and other surface-level content to identify threats. It looks closely at complex document structures, including ZIP-compressed Office files, embedded OLE objects, and encoded script payloads. It does this to find high-risk content accurately.

When a file is uploaded or transferred, the Advanced Virus Scan API checks it for macro signatures and script files. It also looks for script execution logic and known exploit indicators. These indicators include AutoOpen functions, embedded PowerShell, and suspicious DTD entities. This method provides security teams with a strong tool for finding and isolating (or deleting) harmful files before they can reach users or backend systems.

Conclusion

Malicious scripts come in many forms—most commonly macros, JavaScript, and embedded PowerShell. They hide in plain sight inside files that users trust, often evading traditional antivirus software. Reducing exposure to obfuscated script-based threats is critical for securing Enterprise environments full of sensitive data.

To learn more about using Cloudmersive’s Advanced Virus Scan API, please reach out to our team.

Technical Articles

Understanding Common Script-Based Attacks

VBScript / VBA Macros in Office Files

PowerShell in Embedded Objects

Embedded JavaScript in PDFs

XML External Entity (XXE) Attacks

Excel 4.0 (XLM) Macros

SVG with Embedded JavaScript or Malware Links

Identifying Embedded Scripts with Cloudmersive

Conclusion

Related

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Spam Detection APIs

Security Threat Detection APIs

Document and Data Conversion APIs

Validate APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.