uast

package
v0.0.0-...-7b3ef92 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

README

UAST - Unified Abstract Syntax Tree

Go Report Card Go Version License

A Go-native Unified AST (UAST) data model backed by Tree-sitter parsers plus a compact domain-specific language (DSL) for querying and transforming trees. Parse, analyze, and refactor code written in 66+ languages with one toolkit.

📋 Table of Contents

🚀 Quick Start

# Install the CLI tool
go install github.com/dmytrogajewski/hercules/cmd/uast@latest

# Parse a Go file and find all functions
uast parse main.go | uast query 'filter(.type == "Function")'

# Parse Python and find function calls
uast parse -lang python script.py | uast query 'filter(.type == "Call")'

🤔 What is UAST?

UAST (Unified Abstract Syntax Tree) provides a language-agnostic representation of source code. Instead of dealing with different AST formats for each programming language, UAST gives you a single, consistent structure for analyzing code across 100+ languages.

How it works:
Source Code → Tree-sitter Parser → Mapping-driven Conversion → UAST → DSL Queries → Analysis

✨ Features

  • 🌍 Multi-language Support: Parse 66+ programming languages with Tree-sitter grammars
  • 🔍 Powerful DSL: Query and filter nodes with a functional pipeline syntax
  • ⚡ High Performance: Optimized for speed with streaming iterators and memory pools
  • 🛠️ Go-native API: Ergonomic Go APIs for navigation, mutation, and transformation
  • 📊 Change Detection: Language-agnostic diffing and change analysis
  • 🎯 Mapping-driven: DSL-based configuration for language-specific conversions

📦 Installation

Prerequisites
  • Go 1.22 or later
  • Git
Install CLI Tool
go install github.com/dmytrogajewski/hercules/cmd/uast@latest
Use as Library
go get github.com/dmytrogajewski/hercules/pkg/uast

📖 Usage

Basic Parsing
package main

import (
    "fmt"
    "log"
    "github.com/dmytrogajewski/hercules/pkg/uast"
)

func main() {
    // Create parser
    parser, err := uast.NewParser()
    if err != nil {
        log.Fatal(err)
    }

    // Parse Go code
    code := []byte(`package main
func hello() {
    fmt.Println("Hello, World!")
}`)

    node, err := parser.Parse("main.go", code)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Parsed %s with %d children\n", node.Type, len(node.Children))
}
DSL Queries

The UAST DSL provides a functional pipeline syntax for querying nodes:

// Find all exported functions
nodes, err := node.FindDSL("filter(.type == \"Function\" && .roles has \"Exported\")")
if err != nil {
    log.Fatal(err)
}

// Count all string literals
nodes, err := node.FindDSL("filter(.type == \"Literal\") |> reduce(count)")
if err != nil {
    log.Fatal(err)
}

// Find function calls with specific names
nodes, err := node.FindDSL("filter(.type == \"Call\" && .props.name == \"printf\")")
if err != nil {
    log.Fatal(err)
}

Supported DSL Operations:

  • Filtering: filter(.type == "Function")
  • Boolean Logic: &&, ||
  • Equality: ==, !=
  • Membership: .roles has "Exported"
  • Field Access: .token, .type, .props.name
  • Pipelines: |> for chaining operations
Go API
Navigation and Querying
// Streaming pre-order iterator
iter := node.PreOrder()
for node := range iter {
    if node.HasRole("RoleName") {
        // process identifier
    }
}

// Find nodes with predicate
functions := node.Find(func(n *uast.Node) bool {
    return n.Type == "Function"
})
Transformation
// Transform nodes in-place
node.TransformInPlace(func(n *uast.Node) bool {
    if node.HasRole(n, uast.RoleString) {
        n.Token = strings.Trim(n.Token, "\"")
    }
    return true
})
Change Detection
// Detect structural changes between two versions
changes := uast.DetectChanges(before, after)
for _, change := range changes {
    fmt.Printf("%s: %s\n", change.Type, change.File)
}
CLI Tool

The UAST CLI provides command-line access to all features:

# Parse a file and output UAST as JSON
uast parse main.go

# Query UAST using DSL
uast parse main.go | uast query 'filter(.type == "Function")'

# Format UAST output
uast parse main.go | uast fmt

# Detect changes between files
uast diff before.go after.go

# Get help
uast --help

🌍 Language Support

UAST supports 66+ programming languages including:

Popular Languages:

  • Go, Python, Java, JavaScript, TypeScript
  • Rust, C++, C#, Ruby, PHP, Kotlin, Swift

Web Technologies:

  • HTML, CSS, JSON, YAML, XML, Markdown

Configuration Files:

  • Dockerfile, Makefile, CMake, TOML, INI

Specialized Languages:

  • SQL, Haskell, OCaml, Scala, Elixir, Erlang
  • F#, Clojure, Lua, Perl
  • And 50+ more languages

See the language roadmap for the complete list and status.

⚡ Performance

UAST is optimized for high-performance code analysis:

Parsing Performance
  • Small files (~50 lines): ~32μs, 6KB memory
  • Medium files (~100 lines): ~270μs, 57KB memory
  • Large files (~200 lines): ~1ms, 208KB memory
DSL Query Performance
  • Simple field access: ~1.9μs, 2.6KB memory
  • Filter operations: ~4.5μs, 5.7KB memory
  • Complex pipelines: ~10μs, 12KB memory
Tree Traversal
  • Pre-order streaming: ~18μs, 384B memory
  • Find with predicate: ~0.7μs, 248B memory

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on:

Quick Contribution
  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📚 Documentation

🔧 Custom UAST Mappings

The UAST parser supports custom UAST mappings that can be passed during initialization using the option pattern. This allows you to:

  • Add support for custom file extensions
  • Override existing language mappings
  • Add experimental or domain-specific language support
Basic Usage
// Define custom UAST mappings
customMaps := map[string]UASTMap{
    "my_language": {
        Extensions: []string{".mylang", ".ml"},
        UAST: `[language "json", extensions: ".mylang", ".ml"]

_value <- (_value) => uast(
    type: "Synthetic"
)

array <- (array) => uast(
    token: "self",
    type: "Synthetic"
)

document <- (document) => uast(
    type: "Synthetic"
)

object <- (object) => uast(
    token: "self",
    type: "Synthetic"
)

pair <- (pair) => uast(
    type: "Synthetic",
    children: "_value", "string"
)

string <- (string) => uast(
    token: "self",
    type: "Synthetic"
)
`,
    },
}

// Create parser with custom mappings
parser, err := uast.NewParser()
if err != nil {
    log.Fatal(err)
}

parser = parser.WithUASTMap(customMaps)

// Now the parser supports .mylang and .ml files
if parser.IsSupported("config.mylang") {
    // Parse the file
    node, err := parser.Parse("config.mylang", content)
    if err != nil {
        log.Fatal(err)
    }
    // Process the UAST node...
}
Multiple Custom Mappings

You can add multiple custom mappings at once:

customMaps := map[string]UASTMap{
    "config_lang": {
        Extensions: []string{".config"},
        UAST: `[language "json", extensions: ".config"]
// ... mapping rules ...
`,
    },
    "template_lang": {
        Extensions: []string{".tmpl", ".template"},
        UAST: `[language "json", extensions: ".tmpl", ".template"]
// ... mapping rules ...
`,
    },
}

parser = parser.WithUASTMap(customMaps)
Overriding Built-in Parsers

You can override built-in parsers by using the same file extensions:

// Override the built-in JSON parser with custom mapping
customMaps := map[string]UASTMap{
    "custom_json": {
        Extensions: []string{".json"}, // Same extension as built-in JSON parser
        UAST: `[language "json", extensions: ".json"]

_value <- (_value) => uast(
    type: "CustomValue"
)

document <- (document) => uast(
    type: "CustomDocument"
)

object <- (object) => uast(
    token: "self",
    type: "CustomObject"
)

// ... more custom mapping rules ...
`,
    },
}

parser = parser.WithUASTMap(customMaps)

// Now .json files will use your custom parser instead of the built-in one
node, err := parser.Parse("config.json", content)
DSL Format

Custom UAST mappings use the same DSL format as the embedded mappings:

  • Language Declaration: [language "language_name", extensions: ".ext1", ".ext2"]
  • Mapping Rules: node_type <- (tree_sitter_pattern) => uast(...)
  • UAST Specification: Define type, roles, children, properties, and tokens
Integration with Existing Parsers

Custom mappings are loaded in addition to the embedded mappings. Custom UAST maps have priority over built-in ones - if a custom mapping defines extensions that conflict with existing ones, the custom mapping takes precedence and will be used instead of the built-in parser.

This allows you to:

  • Override built-in language parsers with custom implementations
  • Add experimental or domain-specific language support
  • Test new UAST mapping rules without modifying the core library

📄 License

This project is licensed under the MIT License - see the LICENSE.md file for details.

🙏 Acknowledgments


Ready to start analyzing code across languages? Get started with the Quick Start or explore the API Reference.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CountChangesByType

func CountChangesByType(changes []Change) map[ChangeType]int

CountChangesByType counts the number of changes for each ChangeType. Returns a map from ChangeType to count.

func GetAddedNodes

func GetAddedNodes(changes []Change) []*node.Node

GetAddedNodes returns all nodes that were added in the given changes.

func GetModifiedNodes

func GetModifiedNodes(changes []Change) []*node.Node

GetModifiedNodes returns all nodes that were modified in the given changes.

func GetPatternMatchStats

func GetPatternMatchStats() map[string]int64

func GetPatternMatcher

func GetPatternMatcher(language string) interface{}

GetPatternMatcher returns a pre-compiled pattern matcher for the given language

func GetRemovedNodes

func GetRemovedNodes(changes []Change) []*node.Node

GetRemovedNodes returns all nodes that were removed in the given changes.

func Query

func Query(input io.Reader, q string) ([]*node.Node, error)

func RecordPatternMatch

func RecordPatternMatch(language, pattern string, matched bool)

Types

type Change

type Change struct {
	Before *node.Node
	After  *node.Node
	Type   ChangeType
	File   string
}

Change represents a structural change between two versions of code

func DetectChanges

func DetectChanges(before, after *node.Node) []Change

DetectChanges detects structural changes between two UAST nodes. It returns a slice of Change objects describing added, removed, and modified nodes. Now uses the final optimized implementation with ultra-fast integer keys.

Example:

changes := uast.DetectChanges(before, after)
for _, c := range changes {
    fmt.Println(c.Type)
}

func FilterChangesByNodeType

func FilterChangesByNodeType(changes []Change, nodeType node.Type) []Change

FilterChangesByNodeType filters the given changes by the type of nodes involved. Returns a slice of changes where either Before or After node matches nodeType.

func FilterChangesByType

func FilterChangesByType(changes []Change, changeType ChangeType) []Change

FilterChangesByType filters the given changes by their ChangeType. Returns a slice of changes matching the specified type.

type ChangeType

type ChangeType int

ChangeType represents the type of change between two nodes

const (
	ChangeAdded ChangeType = iota
	ChangeRemoved
	ChangeModified
)

func (ChangeType) String

func (ct ChangeType) String() string

type DSLNode

type DSLNode struct {
	Root            sitter.Node
	Tree            *sitter.Tree
	Language        string
	Source          []byte
	MappingRules    []mapping.MappingRule
	PatternMatcher  *mapping.PatternMatcher
	IncludeUnmapped bool
	ParentContext   string
}

DSLNode wraps a Tree-sitter node for conversion to UAST using DSL mappings.

func (*DSLNode) Positions

func (dn *DSLNode) Positions() *node.Positions

Positions returns the source code positions for this node, using uint fields as per UAST spec.

func (*DSLNode) ToCanonicalNode

func (dn *DSLNode) ToCanonicalNode() *node.Node

ToCanonicalNode converts the DSLNode to a canonical UAST Node.

func (*DSLNode) Token

func (dn *DSLNode) Token() string

Token returns the string token for this node, if any.

type DSLParser

type DSLParser struct {
	IncludeUnmapped bool
	// contains filtered or unexported fields
}

DSLParser implements the UAST LanguageParser interface using DSL-based mappings.

func NewDSLParser

func NewDSLParser(reader io.Reader) *DSLParser

NewDSLParser creates a new DSL-based parser with the given language and mapping rules.

func (*DSLParser) Extensions

func (p *DSLParser) Extensions() []string

func (*DSLParser) GetOriginalDSL

func (p *DSLParser) GetOriginalDSL() string

GetOriginalDSL returns the original DSL content that was used to create this parser

func (*DSLParser) Language

func (p *DSLParser) Language() string

Language returns the language name for this parser.

func (*DSLParser) Load

func (p *DSLParser) Load() error

func (*DSLParser) Parse

func (p *DSLParser) Parse(filename string, content []byte) (*node.Node, error)

Parse parses the given file content and returns the root UAST node.

type LanguageParser

type LanguageParser interface {
	Parse(filename string, content []byte) (*node.Node, error)
	Language() string
	Extensions() []string
}

Parser is responsible for parsing source code into UAST nodes

type LanguageParserError

type LanguageParserError struct {
	Parser  string
	Message string
}

func (LanguageParserError) Error

func (e LanguageParserError) Error() string

type Loader

type Loader struct {
	// contains filtered or unexported fields
}

Loader loads UAST parsers for different languages.

func NewLoader

func NewLoader(embedFS fs.FS) *Loader

NewLoader creates a new loader with the given embedded filesystem.

func (*Loader) GetParsers

func (l *Loader) GetParsers() map[string]LanguageParser

GetParsers returns all loaded parsers.

func (*Loader) LanguageParser

func (l *Loader) LanguageParser(extension string) (LanguageParser, bool)

LanguageParser returns the parser for the given file extension.

func (*Loader) LoadParser

func (l *Loader) LoadParser(reader io.Reader) (LanguageParser, error)

LoadParser loads a parser by reading the uastmap file through the reader

type ParseError

type ParseError struct {
	Filename string
	Language string
	Message  string
}

func (ParseError) Error

func (e ParseError) Error() string

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser implements LanguageParser using embedded parsers Entry point for UAST parsing Parser is the main entry point for UAST parsing. It manages language parsers and their configurations.

func NewParser

func NewParser() (*Parser, error)

NewParser creates a new Parser with DSL-based language parsers. It loads parser configurations and instantiates parsers for each supported language. Returns a pointer to the Parser or an error if loading parsers fails.

func (*Parser) GetEmbeddedMappings

func (p *Parser) GetEmbeddedMappings() map[string]UASTMap

GetEmbeddedMappings returns all embedded UAST mappings

func (*Parser) GetEmbeddedMappingsList

func (p *Parser) GetEmbeddedMappingsList() map[string]map[string]interface{}

GetEmbeddedMappingsList returns a lightweight list of embedded UAST mappings (without full content)

func (*Parser) GetMapping

func (p *Parser) GetMapping(language string) (*UASTMap, error)

GetMapping returns a specific embedded UAST mapping by name

func (*Parser) IsSupported

func (p *Parser) IsSupported(filename string) bool

IsSupported returns true if the given filename is supported by any parser.

func (*Parser) Parse

func (p *Parser) Parse(filename string, content []byte) (*node.Node, error)

Parse parses a file and returns its UAST.

func (*Parser) WithUASTMap

func (p *Parser) WithUASTMap(maps map[string]UASTMap) *Parser

WithUASTMap adds custom UAST mappings to the parser using the option pattern. This method allows passing custom UAST map configurations that will be used in addition to or as a replacement for the embedded mappings.

type PrecompiledMapping

type PrecompiledMapping struct {
	Language   string                 `json:"language"`
	Extensions []string               `json:"extensions"`
	Rules      []mapping.MappingRule  `json:"rules"`
	Patterns   map[string]interface{} `json:"patterns"`
	CompiledAt string                 `json:"compiled_at"`
}

PrecompiledMapping represents the pre-compiled mapping data

type UASTMap

type UASTMap struct {
	Extensions []string `json:"extensions"`
	UAST       string   `json:"uast"`
}

UASTMap represents a custom UAST mapping configuration

type UnsupportedLanguageError

type UnsupportedLanguageError struct {
	Language string
	Filename string
}

Error types for better error handling

func (UnsupportedLanguageError) Error

func (e UnsupportedLanguageError) Error() string

Directories

Path Synopsis
pkg
node
Package uast provides a universal abstract syntax tree (UAST) representation and utilities for parsing, navigating, querying, and mutating code structure in a language-agnostic way.
Package uast provides a universal abstract syntax tree (UAST) representation and utilities for parsing, navigating, querying, and mutating code structure in a language-agnostic way.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL