README
ΒΆ
shape-xml
Repository: github.com/shapestone/shape-xml
A Go library for parsing, validating, and building XML documents β part of the Shape Parser ecosystem.
Overview
shape-xml parses XML (Extensible Markup Language) documents into a unified Abstract Syntax Tree (AST) that integrates with the Shape Parser ecosystem. It solves the problem of needing both fast XML validation and full structural parsing in the same library β without juggling two different dependencies. Whether you're validating a configuration file, ingesting a data feed, or transforming XML into another format, shape-xml gives you one consistent API.
Who it's for:
- Go developers who need to parse, validate, or transform XML configuration files, data feeds, or API payloads
- Teams building format-agnostic data pipelines who want a unified AST across JSON, YAML, and XML
- Anyone who needs thread-safe, concurrent XML processing in production Go services
Quick Start
# Install
go get github.com/shapestone/shape-xml
# Run tests
make test
# Run tests with coverage report
make coverage
# Build
make build
Make Targets
| Target | Description |
|---|---|
make test |
Run all tests with race detection |
make lint |
Run golangci-lint linter |
make build |
Build all packages |
make coverage |
Generate HTML coverage report in coverage/ |
make clean |
Remove generated files (coverage dir, build artifacts) |
make all |
Run test, lint, build, and coverage in sequence |
make bench |
Run all benchmarks with memory stats |
make bench-report |
Run benchmarks and save results to benchmarks/results.txt |
make fuzz |
Run fuzz tests (30 seconds each for Parse, Validate, Parser) |
Installation
go get github.com/shapestone/shape-xml
Usage
Parse XML to AST
import "github.com/shapestone/shape-xml/pkg/xml"
// Parse XML from string
node, err := xml.Parse(`<user id="123"><name>Alice</name></user>`)
if err != nil {
log.Fatal(err)
}
// Access attributes (prefixed with @)
obj := node.(*ast.ObjectNode)
idNode, _ := obj.GetProperty("@id")
id := idNode.(*ast.LiteralNode).Value().(string) // "123"
Validate XML (Fast Path)
// Fast validation without AST construction - idiomatic Go
if err := xml.Validate(`<root><child>value</child></root>`); err != nil {
fmt.Println("Invalid XML:", err)
}
// err == nil means valid XML
Parse from Stream
file, err := os.Open("data.xml")
if err != nil {
log.Fatal(err)
}
defer file.Close()
node, err := xml.ParseReader(file)
if err != nil {
log.Fatal(err)
}
Fluent DOM API
Build XML programmatically with a type-safe, chainable API:
import "github.com/shapestone/shape-xml/pkg/xml"
// Build XML programmatically
user := xml.NewElement("user").
Attr("id", "123").
Attr("active", "true").
Child(xml.NewElement("name").Text("Alice")).
Child(xml.NewElement("email").Text("[email protected]"))
// Render to XML string
output := user.Render()
// <user id="123" active="true"><name>Alice</name><email>[email protected]</email></user>
Marshal/Unmarshal
type User struct {
ID string `xml:"id,attr"`
Name string `xml:"name"`
Email string `xml:"email"`
}
// Marshal Go struct to XML
user := User{ID: "123", Name: "Alice", Email: "[email protected]"}
data, err := xml.Marshal(user)
// Unmarshal XML to Go struct
var parsed User
err = xml.Unmarshal(data, &parsed)
Features
- Dual-Path Parser Pattern
- Fast validation path (4-5x faster, no AST (Abstract Syntax Tree) construction)
- Full parsing path (complete AST generation)
- Automatic path selection for optimal performance
- Streaming Support
- Parse large files with constant memory usage via
ParseReader - Validate streams efficiently with
ValidateReader
- Parse large files with constant memory usage via
- Fluent DOM API
- Type-safe element construction
- Chainable method calls for building XML programmatically
- Round-Trip Fidelity
- Parse β AST β Render β Parse preserves structure
- Marshal/Unmarshal support for Go structs
- Universal AST Integration
- Attributes:
@prefix (e.g.,@id,@class) - Text content:
#textproperty - CDATA sections:
#cdataproperty - Namespace support: preserved in element names
- Attributes:
- Production Ready
- Thread-safe concurrent operations
- Comprehensive test coverage (80.0%+)
- Fuzz testing
- Benchmark suite
- Zero external dependencies (except shape-core)
- LL(1) Recursive Descent Parser
- Shape AST Integration: Returns unified AST nodes for advanced use cases
- Comprehensive Error Messages: Context-aware error reporting
Use Cases
- Config file parsing: Read and validate XML configuration files (Spring, Maven
pom.xml, Ant build files,.csproj) into typed Go structs - Data feed ingestion: Parse RSS/Atom feeds, SOAP responses, or EDI XML payloads from external APIs
- XML validation pipeline: Use the fast-path
Validate()to gate ingestion before processing β no wasted allocations on malformed input - Format conversion: Parse XML into the Shape AST and re-render as JSON or YAML using the sibling shape-json / shape-yaml parsers
- XML generation: Build well-formed XML documents programmatically using the fluent DOM API without error-prone string templating
- Round-trip transformation: Parse β modify AST β render back to XML while preserving document structure
- Concurrent document processing: Process thousands of XML documents in parallel goroutines safely β all public APIs are thread-safe
XML β AST Conventions
shape-xml follows these conventions when mapping XML to the universal AST:
<user id="123" xmlns:custom="http://example.com">
<name>Alice</name>
<custom:role>Admin</custom:role>
<bio><![CDATA[Uses <tags>]]></bio>
</user>
Maps to:
*ast.ObjectNode{
properties: {
"@id": *ast.LiteralNode{value: "123"},
"@xmlns:custom": *ast.LiteralNode{value: "http://example.com"},
"name": *ast.LiteralNode{value: "Alice"},
"custom:role": *ast.LiteralNode{value: "Admin"},
"bio": *ast.ObjectNode{
properties: {
"#cdata": *ast.LiteralNode{value: "Uses <tags>"},
},
},
},
}
Performance
shape-xml uses an intelligent dual-path architecture that automatically selects the optimal parsing strategy:
β‘ Fast Path (Validation Only)
The fast path bypasses AST construction for maximum performance:
- APIs:
Validate(),ValidateReader() - Performance: 4-5x faster than AST path
- Use when: You just need to validate XML syntax
// Fast path - validation only (4-5x faster!)
if err := xml.Validate(xmlString); err != nil {
// Invalid XML
}
Benchmark results:
- Small XML (50 bytes): ~140 ns/op (validation)
- Medium XML (1KB): ~127 Β΅s/op (7.97 MB/s)
- Large XML (340KB): ~40.8 ms/op (8.35 MB/s)
π³ AST Path (Full Features)
The AST path builds a complete Abstract Syntax Tree:
- APIs:
Parse(),ParseReader(),Marshal(),Unmarshal() - Performance: Slower, more memory (enables advanced features)
- Use when: You need AST manipulation, rendering, or format conversion
// AST path - full tree structure for advanced features
node, _ := xml.Parse(xmlString)
// Work with AST, transform, render, etc.
Run benchmarks:
make bench
Architecture
shape-xml uses a unified architecture with custom parsers:
- Grammar-Driven: EBNF (Extended Backus-Naur Form) grammar in
docs/grammar/xml.ebnf - Tokenizer: Custom tokenizer using Shape's framework
- Parser: LL(1) (Left-to-right, Leftmost derivation, 1-token lookahead) recursive descent
- Rendering: Custom XML renderer
- AST Representation:
- Elements β
*ast.ObjectNodewith properties map - Attributes β Properties with
@prefix - Text content β
#textproperty - CDATA β
#cdataproperty - Primitives β
*ast.LiteralNode(string values)
- Elements β
Grammar
See docs/grammar/xml.ebnf for the complete EBNF specification.
Key grammar rules:
Document = [ XMLDecl ] Element ;
Element = EmptyElement | StartTag Content EndTag ;
StartTag = "<" Name { Attribute } ">" ;
EndTag = "</" Name ">" ;
Thread Safety
shape-xml is thread-safe. All public APIs can be called concurrently from multiple goroutines without external synchronization.
Safe for Concurrent Use
// β
SAFE: Multiple goroutines can call these concurrently
go func() {
var v1 interface{}
xml.Unmarshal(data1, &v1)
}()
go func() {
var v2 interface{}
xml.Unmarshal(data2, &v2)
}()
// β
SAFE: Parse, Marshal, Validate all create new instances
go func() { xml.Parse(input1) }()
go func() { xml.Marshal(obj1) }()
go func() { xml.Validate(input2) }()
Thread Safety Guarantees
Unmarshal(),Marshal()- Thread-safe; encoder cache uses copy-on-write withsync.WaitGroupto safely handle recursive types under concurrent loadParse(),Validate()- Thread-safe, create new parser instances- Race detector verified - All tests pass with
go test -race
Testing
shape-xml has comprehensive test coverage including unit tests, fuzzing, and grammar verification.
Coverage Summary
- Fast Parser: 55.7%
- Parser: 77.3%
- XML API: 67.4%
- Overall Library: 55.0%
- Target: 80%+ (current gap: 25 percentage points)
Quick Start
# Run all tests
make test
# Run with coverage
make coverage
# Run fuzzing tests
make fuzz
Fuzzing
The parser includes extensive fuzzing tests to ensure robustness:
# Fuzz parser
go test ./pkg/xml -fuzz=FuzzParse -fuzztime=30s
go test ./pkg/xml -fuzz=FuzzValidate -fuzztime=30s
go test ./pkg/xml -fuzz=FuzzRender -fuzztime=30s
go test ./pkg/xml -fuzz=FuzzMarshal -fuzztime=30s
API Reference
Parsing Functions
Parse(input string) (ast.SchemaNode, error)- Parse XML from stringParseReader(reader io.Reader) (ast.SchemaNode, error)- Parse from stream
Validation Functions
Validate(input string) error- Fast validation without ASTValidateReader(reader io.Reader) error- Validate from stream
Marshaling Functions
Marshal(v interface{}) ([]byte, error)- Go struct β XMLMarshalIndent(v interface{}, prefix, indent string) ([]byte, error)- Pretty-printUnmarshal(data []byte, v interface{}) error- XML β Go struct
Rendering Functions
Render(node ast.SchemaNode) []byte- AST β compact XMLRenderIndent(node ast.SchemaNode, prefix, indent string) []byte- AST β pretty XML
DOM API
NewElement(name string) *Element- Create element builderElement.Attr(name, value string) *Element- Add attribute (chainable)Element.Text(content string) *Element- Set text content (chainable)Element.CDATA(content string) *Element- Set CDATA content (chainable)Element.Child(child *Element) *Element- Add child element (chainable)Element.Render() []byte- Render to XML bytes
Documentation
- EBNF Grammar - Complete XML grammar specification
- Parser Implementation Guide - Guide for implementing parsers
- Shape ADR 0004: LL(1) Recursive Descent Parser Strategy - Parser design principles
- Shape ADR 0005: Grammar-as-Verification - Grammar verification approach
- Shape Core Infrastructure - Universal AST and tokenizer framework
- Shape Ecosystem - Documentation and examples
Development
# Run tests
make test
# Generate coverage report
make coverage
# Build
make build
# Run all checks
make all
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
Apache License 2.0
Copyright Β© 2020-2025 Shapestone
See LICENSE for the full license text and NOTICE for third-party attributions.
Part of Shape Ecosystem
shape-xml is part of the Shape ecosystem:
- shape-core - Universal AST and validation framework
- shape-json - JSON parser
- shape-yaml - YAML parser
- shape-xml - XML parser (this project)
Directories
ΒΆ
| Path | Synopsis |
|---|---|
|
internal
|
|
|
fastparser
Package fastparser implements a high-performance XML parser without AST construction.
|
Package fastparser implements a high-performance XML parser without AST construction. |
|
parser
Package parser implements LL(1) recursive descent parsing for XML format.
|
Package parser implements LL(1) recursive descent parsing for XML format. |
|
tokenizer
Package tokenizer provides XML tokenization using Shape's tokenizer framework.
|
Package tokenizer provides XML tokenization using Shape's tokenizer framework. |
|
pkg
|
|
|
xml
Package xml provides conversion between AST nodes and Go native types.
|
Package xml provides conversion between AST nodes and Go native types. |