Skip to main content

data-lineage

Column-level lineage tracing through SQL, Kafka, Spark, and JDBC codebases.

Requirements

None — uses only codebase search, file reads, and inline SVG generation. No database connections or external tools needed.

What It Does

When you invoke /data-lineage with a column name:

  1. Searches for SQL definitions (tables, views, migrations) containing the column
  2. Traces view chains recursively to find source tables
  3. Searches Java/Scala/Python code for JDBC writes, Spark jobs, and Kafka producers
  4. Traces upstream sources recursively (up to 5 hops)
  5. Finds downstream consumers (up to 3 hops)
  6. Rates each hop's confidence (high/medium/low)
  7. Generates a structured lineage path and an SVG diagram

Output

Each trace produces two outputs:

Structured lineage path — a text trace showing each hop from source to destination, with file references, transformations, and confidence ratings.

Visual diagram — an SVG file (e.g., lineage-total_amount.svg) showing the lineage graph. Nodes are color-coded by system type, edges show data flow direction, and the target column is highlighted. Opens in any browser.

Usage

Trace a qualified column

/data-lineage orders.total_amount

Trace with full qualification

/data-lineage reporting.daily_summary.total_amount

Trace with context

/data-lineage customer_id in reporting.daily_summary

Trace an unqualified column

/data-lineage revenue

If the column exists in multiple tables, you'll be asked which one to trace.

Supported Technologies

TechnologyWhat it finds
SQL (any dialect)CREATE TABLE/VIEW, INSERT, SELECT, ALTER, migrations
JDBCPreparedStatement, JdbcTemplate, ResultSet, direct SQL strings
SparkDataFrame reads/writes, Spark SQL, insertInto, saveAsTable
KafkaProducerRecord, @KafkaListener, consumer subscriptions, topic configs
ORM@Table, @Column, @Entity annotations, entity field mappings
dbtref(), source(), model SQL files
Configapplication.yml/properties, topic-to-table mappings

Confidence Ratings

Every hop gets a confidence rating:

RatingMeaning
HighDirect SQL column reference, explicit column mapping in code
MediumTable-level match but column mapping is implicit (SELECT *, dynamic fields)
LowInferred from naming convention, proximity in code, or config-only reference

Overall confidence is the minimum across all hops. "I couldn't find this connection" is a valid and useful result.

Design Notes

Why no external dependencies?

Works in any codebase without setup — no database access, no graphing tools. Just point it at code.

Why column-level?

Table-level lineage is easy but not useful enough. Knowing "Table A feeds Table B" doesn't help when you need to know which column maps to which and what transformation happens in between.

Why confidence ratings?

In messy stacks, not every hop can be traced with certainty. Rather than pretending certainty or giving up, the skill reports what it found and how confident it is. Focus manual investigation on the uncertain hops.

Why SVG?

Self-contained, opens in any browser, requires no dependencies. Can be shared, embedded in documentation, or viewed alongside code. ASCII diagrams are available as a fallback.