data-lineage
Column-level lineage tracing through SQL, Kafka, Spark, and JDBC codebases.
Requirements
None — uses only codebase search, file reads, and inline SVG generation. No database connections or external tools needed.
What It Does
When you invoke /data-lineage with a column name:
- Searches for SQL definitions (tables, views, migrations) containing the column
- Traces view chains recursively to find source tables
- Searches Java/Scala/Python code for JDBC writes, Spark jobs, and Kafka producers
- Traces upstream sources recursively (up to 5 hops)
- Finds downstream consumers (up to 3 hops)
- Rates each hop's confidence (high/medium/low)
- Generates a structured lineage path and an SVG diagram
Output
Each trace produces two outputs:
Structured lineage path — a text trace showing each hop from source to destination, with file references, transformations, and confidence ratings.
Visual diagram — an SVG file (e.g., lineage-total_amount.svg) showing the lineage graph. Nodes are color-coded by system type, edges show data flow direction, and the target column is highlighted. Opens in any browser.
Usage
Trace a qualified column
/data-lineage orders.total_amount
Trace with full qualification
/data-lineage reporting.daily_summary.total_amount
Trace with context
/data-lineage customer_id in reporting.daily_summary
Trace an unqualified column
/data-lineage revenue
If the column exists in multiple tables, you'll be asked which one to trace.
Supported Technologies
| Technology | What it finds |
|---|---|
| SQL (any dialect) | CREATE TABLE/VIEW, INSERT, SELECT, ALTER, migrations |
| JDBC | PreparedStatement, JdbcTemplate, ResultSet, direct SQL strings |
| Spark | DataFrame reads/writes, Spark SQL, insertInto, saveAsTable |
| Kafka | ProducerRecord, @KafkaListener, consumer subscriptions, topic configs |
| ORM | @Table, @Column, @Entity annotations, entity field mappings |
| dbt | ref(), source(), model SQL files |
| Config | application.yml/properties, topic-to-table mappings |
Confidence Ratings
Every hop gets a confidence rating:
| Rating | Meaning |
|---|---|
| High | Direct SQL column reference, explicit column mapping in code |
| Medium | Table-level match but column mapping is implicit (SELECT *, dynamic fields) |
| Low | Inferred from naming convention, proximity in code, or config-only reference |
Overall confidence is the minimum across all hops. "I couldn't find this connection" is a valid and useful result.
Design Notes
Why no external dependencies?
Works in any codebase without setup — no database access, no graphing tools. Just point it at code.
Why column-level?
Table-level lineage is easy but not useful enough. Knowing "Table A feeds Table B" doesn't help when you need to know which column maps to which and what transformation happens in between.
Why confidence ratings?
In messy stacks, not every hop can be traced with certainty. Rather than pretending certainty or giving up, the skill reports what it found and how confident it is. Focus manual investigation on the uncertain hops.
Why SVG?
Self-contained, opens in any browser, requires no dependencies. Can be shared, embedded in documentation, or viewed alongside code. ASCII diagrams are available as a fallback.