Phase 1: FOUNDATION — What is XML, What is a Parser, and What Happens Before XXE Exists
To understand XML External Entity (XXE), you must first understand what XML is, what a parser does, and why external entities were introduced. Everything about XXE is a consequence of how XML is structured and how it’s interpreted. We’re not touching any vulnerabilities yet. First, we study the foundation from first principles.
XML (eXtensible Markup Language) is a way to store and transport data in a structured, human-readable format.
It is not a programming language. It is a markup format — like HTML — but instead of defining how things look, it defines what things mean.
Example:
<user>
<name>Isaac</name>
<role>admin</role>
</user>
This means there is a user object with two properties: name and role. There is no logic here — just structure.
Originally created in 1996, XML was built to do what JSON does now: exchange data between different systems (e.g., banks, servers, APIs, devices).
It was adopted everywhere: SOAP APIs, SAML, DOCX/XLSX internals, SVG images, RSS feeds, Office formats, old mobile configs, etc.