XML External Entities (XXE): Parser Configuration
XML External Entity (XXE) attacks exploit a feature of XML parsers that allows documents to reference external resources. What was designed for modularity and reuse became one of the most dangerous...
Key Insights
- Most XML parsers are vulnerable to XXE attacks by default—secure configuration requires explicit action, not assumptions about safe defaults.
- Disabling DTD processing entirely is the most reliable defense; if DTDs are required, disable external entities and external DTD loading separately.
- XXE vulnerabilities extend beyond file disclosure to include SSRF attacks, denial of service, and port scanning—making parser hardening critical even for internal applications.
Introduction to XXE Vulnerabilities
XML External Entity (XXE) attacks exploit a feature of XML parsers that allows documents to reference external resources. What was designed for modularity and reuse became one of the most dangerous web application vulnerabilities, consistently appearing in OWASP’s Top 10.
The attack surface is deceptively large. Any application that parses XML—configuration files, API requests, document uploads, SOAP services, SVG images, even Office documents—can be vulnerable. The 2014 billion laughs attack against Facebook, the 2019 XXE in Jira affecting thousands of instances, and countless data breaches trace back to misconfigured XML parsers.
Parser configuration is your primary defense because XXE is fundamentally a feature, not a bug. You’re not patching a flaw—you’re disabling dangerous functionality that most applications don’t need.
How XXE Attacks Work
XXE attacks abuse the Document Type Definition (DTD) mechanism in XML. DTDs allow you to define entities—essentially variables that get replaced when the document is parsed. External entities take this further by loading content from URIs.
Here’s a malicious payload that reads /etc/passwd:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInfo>
<name>&xxe;</name>
</userInfo>
When a vulnerable parser processes this document, it resolves &xxe; by reading the file and inserting its contents. If the application reflects this data back—in an error message, API response, or rendered page—the attacker exfiltrates the file.
The attack variants are extensive:
<!-- SSRF: Probe internal network -->
<!ENTITY xxe SYSTEM "http://192.168.1.1/admin">
<!-- SSRF: Cloud metadata theft -->
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
<!-- Denial of Service (Billion Laughs) -->
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!-- Continues exponentially... -->
Blind XXE is particularly insidious. Even when the application doesn’t reflect parsed content, attackers can exfiltrate data through out-of-band channels by having the parser make HTTP requests to attacker-controlled servers.
Vulnerable vs. Secure Parser Defaults
The uncomfortable truth: most XML parsers ship with dangerous defaults. Here’s the landscape:
| Parser | Language | External Entities | DTD Processing | Secure by Default |
|---|---|---|---|---|
| DocumentBuilderFactory | Java | Enabled | Enabled | No |
| SAXParserFactory | Java | Enabled | Enabled | No |
| XMLReader | Java | Enabled | Enabled | No |
| lxml | Python | Enabled | Enabled | No |
| xml.etree | Python | Disabled | Limited | Mostly |
| XmlDocument | .NET | Enabled* | Enabled | No |
| XmlReader | .NET | Disabled | Configurable | Yes (4.5.2+) |
| libxml2 | PHP | Enabled | Enabled | No |
| SimpleXML | PHP | Enabled | Enabled | No |
*XmlDocument behavior changed significantly in .NET 4.5.2+
Here’s what vulnerable parsing looks like in Java—code you’ll find in countless production applications:
// VULNERABLE: Default configuration allows XXE
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlInput)));
// This will happily read /etc/passwd if the XML contains an XXE payload
The parser does exactly what the XML specification says it should. That’s the problem.
Secure Configuration by Language/Parser
Java: DocumentBuilderFactory and SAXParserFactory
Java requires the most verbose configuration. The OWASP recommendation is to disable DTDs entirely when possible:
// SECURE: Complete XXE prevention for DocumentBuilderFactory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Disable DTDs entirely (most secure)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// If DTDs are required, disable external entities instead
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// Additional hardening
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputSource);
For SAXParserFactory:
// SECURE: SAXParserFactory configuration
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser parser = factory.newSAXParser();
Python: defusedxml and lxml
Python’s standard library xml.etree.ElementTree is relatively safe but limited. For full-featured XML parsing, use defusedxml:
# SECURE: Using defusedxml (recommended)
import defusedxml.ElementTree as ET
# Automatically blocks XXE, DTDs, and entity expansion attacks
tree = ET.parse('document.xml')
root = ET.fromstring(xml_string)
If you must use lxml directly:
# SECURE: lxml with XXE prevention
from lxml import etree
# Create parser with security settings
parser = etree.XMLParser(
resolve_entities=False,
no_network=True,
dtd_validation=False,
load_dtd=False
)
# Use the secure parser
tree = etree.parse('document.xml', parser)
root = etree.fromstring(xml_string, parser)
.NET: XmlReaderSettings
Modern .NET (4.5.2+) has safer defaults, but explicit configuration is still recommended:
// SECURE: XmlReaderSettings configuration
var settings = new XmlReaderSettings
{
DtdProcessing = DtdProcessing.Prohibit, // Blocks DTDs entirely
XmlResolver = null, // Prevents external resource resolution
MaxCharactersFromEntities = 1024, // Limits entity expansion (DoS protection)
MaxCharactersInDocument = 1024 * 1024 // Limits document size
};
using (var reader = XmlReader.Create(inputStream, settings))
{
var doc = new XmlDocument();
doc.Load(reader);
}
// For XmlDocument directly (if you must)
var xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = null; // Critical: prevents external resolution
xmlDoc.LoadXml(xmlString);
PHP: libxml Configuration
PHP requires disabling entity loading before parsing:
// SECURE: PHP XXE prevention
// Disable external entity loading globally
libxml_disable_entity_loader(true);
// For SimpleXML
$xml = simplexml_load_string($xmlString, 'SimpleXMLElement', LIBXML_NOENT | LIBXML_NONET);
// For DOMDocument
$dom = new DOMDocument();
$dom->loadXML($xmlString, LIBXML_NOENT | LIBXML_NONET | LIBXML_DTDLOAD);
// Note: libxml_disable_entity_loader() is deprecated in PHP 8.0+
// In PHP 8.0+, use LIBXML_NOENT flag consistently
Defense in Depth Strategies
Parser configuration is essential but shouldn’t be your only defense.
Input validation before parsing: Reject XML documents containing DOCTYPE declarations if your application doesn’t need them:
def safe_parse(xml_string):
# Quick check before expensive parsing
if '<!DOCTYPE' in xml_string.upper() or '<!ENTITY' in xml_string.upper():
raise ValueError("DOCTYPE/ENTITY declarations not allowed")
return defusedxml.ElementTree.fromstring(xml_string)
Web Application Firewall rules: Configure your WAF to block requests containing XXE patterns. Most WAFs have built-in XXE detection, but verify it’s enabled.
Network segmentation: Limit the blast radius of SSRF attacks by ensuring your XML-parsing services can’t reach sensitive internal endpoints. The metadata service at 169.254.169.254 should be blocked from application servers.
Least-privilege file access: Run XML-parsing processes with minimal filesystem permissions. Even if XXE succeeds, the attacker can only read files the process can access.
Consider JSON: If you control both ends of the communication, JSON eliminates XXE risk entirely. Many SOAP services can be migrated to REST/JSON.
Testing Your Configuration
Don’t trust documentation—verify your parser is actually secure. Here’s a test suite approach:
// JUnit test for XXE prevention
public class XXEPreventionTest {
private static final String XXE_PAYLOAD =
"<?xml version=\"1.0\"?>" +
"<!DOCTYPE foo [<!ENTITY xxe SYSTEM \"file:///etc/passwd\">]>" +
"<data>&xxe;</data>";
@Test
public void shouldRejectXXEPayload() {
DocumentBuilderFactory factory = SecureXmlFactory.create(); // Your secure factory
assertThrows(SAXParseException.class, () -> {
DocumentBuilder builder = factory.newDocumentBuilder();
builder.parse(new InputSource(new StringReader(XXE_PAYLOAD)));
});
}
@Test
public void shouldRejectBillionLaughs() {
String billionLaughs = "<?xml version=\"1.0\"?>" +
"<!DOCTYPE lolz [" +
"<!ENTITY lol \"lol\">" +
"<!ENTITY lol2 \"&lol;&lol;&lol;&lol;&lol;\">" +
"<!ENTITY lol3 \"&lol2;&lol2;&lol2;&lol2;&lol2;\">]>" +
"<lolz>&lol3;</lolz>";
DocumentBuilderFactory factory = SecureXmlFactory.create();
assertThrows(Exception.class, () -> {
DocumentBuilder builder = factory.newDocumentBuilder();
builder.parse(new InputSource(new StringReader(billionLaughs)));
});
}
}
For Python:
import pytest
import defusedxml.ElementTree as ET
from defusedxml import DefusedXmlException
XXE_PAYLOAD = '''<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<data>&xxe;</data>'''
def test_xxe_blocked():
with pytest.raises(DefusedXmlException):
ET.fromstring(XXE_PAYLOAD)
def test_ssrf_blocked():
ssrf_payload = '''<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://evil.com/steal">]>
<data>&xxe;</data>'''
with pytest.raises(DefusedXmlException):
ET.fromstring(ssrf_payload)
Summary and Quick Reference
Essential Parser Settings Checklist:
- Disable DTD processing entirely (
disallow-doctype-declin Java) - If DTDs required: disable external general entities
- If DTDs required: disable external parameter entities
- If DTDs required: disable external DTD loading
- Disable XInclude processing
- Set XmlResolver to null (.NET)
- Use
defusedxmlinstead of standard library (Python) - Call
libxml_disable_entity_loader(true)(PHP < 8.0) - Add unit tests that verify XXE payloads are rejected
- Review all XML parsing code paths, including dependencies
The safest XML parser is one that never sees untrusted input. When that’s not possible, configure defensively, test thoroughly, and assume your parser’s defaults will betray you.
For comprehensive coverage of edge cases and additional languages, consult the OWASP XXE Prevention Cheat Sheet.