XML
Full form - eXtensible Markup Language Structured, verbose, schema validation. Enterprise standard.
Syntax
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<app>MyApp</app>
<version>1.0.0</version>
<debug>true</debug>
<port>8080</port>
<database>
<host>localhost</host>
<port>5432</port>
<name>mydb</name>
</database>
<features>
<feature>auth</feature>
<feature>logging</feature>
<feature>cache</feature>
</features>
<!-- Comment -->
<timeout>30</timeout>
</configuration>
Attributes vs Elements
<!-- As attributes -->
<server host="localhost" port="8080" debug="true"/>
<!-- As elements -->
<server>
<host>localhost</host>
<port>8080</port>
<debug>true</debug>
</server>
Best practice: Use elements (more structured, easier to parse).
Special Features
CDATA (for special characters)
<query>
<![CDATA[
SELECT * FROM users WHERE name = 'O'Brien'
]]>
</query>
Namespaces (avoid unless required)
<config xmlns:db="http://example.com/database">
<db:connection>localhost:5432</db:connection>
</config>
Pros ✅
- Self-documenting
- Schema validation (XSD)
- Comments allowed
- Namespaces for organization
- Industry standard (enterprise)
Cons ❌
- Verbose (much larger files)
- Slow to parse
- Complex syntax
- Not human-friendly
- Overkill for simple configs
When to Use
- Enterprise applications
- Legacy system integration
- Complex data + validation required
- SOAP web services
- When schema (XSD) is critical
Python Usage
import xml.etree.ElementTree as ET
# Read
tree = ET.parse('config.xml')
root = tree.getroot()
# Access values
app = root.find('app').text
host = root.find('database/host').text
# Access attributes
port = root.find('server').get('port')
# Write
root = ET.Element('configuration')
ET.SubElement(root, 'app').text = 'MyApp'
ET.SubElement(root, 'port').text = '8080'
tree = ET.ElementTree(root)
tree.write('config.xml', encoding='utf-8', xml_declaration=True)
Iterate Elements
# Loop through features
for feature in root.findall('features/feature'):
print(feature.text)
# Loop with attributes
for server in root.findall('server'):
name = server.get('name')
host = server.get('host')
print(f"{name}: {host}")
Common Mistakes
❌ Missing XML declaration
<configuration>
...
</configuration>
✅ Always include declaration
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
...
</configuration>
❌ Forgetting to close tags
<host>localhost
✅ Always close tags
<host>localhost</host>
Validation (XSD)
# Read with validation
from lxml import etree
schema = etree.XMLSchema(file='schema.xsd')
doc = etree.parse('config.xml')
if schema.validate(doc):
print("Valid!")
else:
print(schema.error_log)
Tools
# Pretty print
xmllint --format config.xml
# Validate against schema
xmllint --schema schema.xsd config.xml
# Extract value
xmllint --xpath '//database/host/text()' config.xml
Size Comparison
Same config in 3 formats:
JSON (0.1 KB): {"host":"localhost","port":5432}
YAML (0.15 KB):
host: localhost
port: 5432
XML (0.3 KB):
<?xml version="1.0"?>
<config>
<host>localhost</host>
<port>5432</port>
</config>
Verdict: XML is 3x larger for same data.