sourcemaps — design plan¶
status: plan. only the slice by run needs is built today (a forward,
line-level table). this document is the design for the full bidirectional,
column-accurate sourcemap and the work to get there. nothing here is
implemented beyond what is noted under "what exists today"
why¶
a .by file is transpiled to python before anything runs or is analyzed
downstream. every tool that reports a position therefore risks pointing at
generated python instead of the .by the user wrote:
- runtime tracebacks — frames land in generated
.py, with line numbers shifted by the preamble and surface syntax replaced by its lowering - type-checker diagnostics on the
.py— if a downstream tool checks the generated output, spans don't map back - editor features — go-to-definition, hover, rename, inlay hints driven off
the generated python need to round-trip to
.bycoordinates - reverse transpile —
.py→.byhas the same problem mirrored
a correct sourcemap is the single primitive all of these share
what exists today¶
source_map::line_table(source, edits)— builds a forward, line-level table (output line → input line,Nonefor generated lines) for one edit-application passtranspile_typed_with_mapcomposes a whole-pipeline line table by lifting the phase-0 table over the count of generated preamble lines (valid because phase 1 applies no body edits and later phases only prepend). consumed byby runto rewrite tracebacks and byby transpileto map a transpiler-invalid-output span back to its.byline for a source-annotated diagnostic
limitations to remove¶
- line granularity — a column in
int & strcan't map to the column inIntersection[int, str]. fine for traceback frames, useless for hover/rename - forward only — no
.by→.pydirection (needed by editors that start from a.byposition) - prepend-only assumption — the composition trick assumes later phases never insert mid-body. true today; brittle as a contract
- untested arithmetic — offset math has no property tests
goals¶
- byte-accurate — map arbitrary
.bybyte offsets ↔.pybyte offsets, not just lines - bidirectional —
by_to_pyandpy_to_by, both total (return the nearest enclosing mapped span when a position is inside generated code) - whole-pipeline — composes across every phase, including the lazy-import preamble and any mid-body hoist, with no "only prepend" assumption
- reverse-aware — the reverse pipeline produces a map of the same shape
- tested — composition and round-trip are property-tested
design¶
representation: a segment list¶
model a single transformation step as an ordered list of segments, each either copied or generated:
Segment {
in: Option<Range<TextSize>>, // source bytes, None = generated
out: Range<TextSize>, // output bytes
}
a step's map is the list of segments covering the whole output in order. copied
segments carry the byte range they came from; generated segments (preamble,
hoisted classes, replacement text with no 1:1 origin) carry None. this
subsumes both line and column mapping — line lookups are derived by counting
newlines within a segment
this is the same shape as standard sourcemap "mappings", specialized to a single file pair and kept as byte ranges rather than line/column until rendered
per-phase production¶
each phase already knows its edits; each produces a segment list as a byproduct:
- phase 0 (ast_driver) — the splice applies a sorted, non-overlapping edit
list via
replace_range. walking that list yields segments directly: copied runs between edits, generated runs for replacement text, plus generated segments for the prepended imports and anyctx.hoistedinsertions (whose insertion offset is known) - phase 1 (lowering) — currently a preamble prepend only; one generated segment for the preamble, identity for the body
- import-redirect / lazy — within-line text replacements (
typing→typing_extensions,lazyprefixes) become copied/generated segment pairs; prepended polyfill preambles are generated segments - anon-NT cleanup — prepend + occasional body edit; same treatment
composition¶
compose two steps A: mid→src and B: out→mid into out→src by walking B's
segments and, for each copied segment, intersecting its mid range against A's
segments to resolve back to src (splitting where A changes origin). generated
segments stay generated. this is associative, so the pipeline map is a fold over
the per-phase maps. line-level line_table becomes a derived view of the
composed byte map, letting by run keep working unchanged
lookup¶
py_to_by(offset)— binary search the output ranges; return the mapped.byoffset, or the start of the nearest enclosing copied segment when inside generated codeby_to_py(offset)— the same against an index built on theinranges; when a.byoffset expands to multiple output spans (rare), return the first
reverse transpile¶
the reverse pipeline builds a map of the same type; consumers use one API for both directions
consumers and rollout¶
phase the work behind real consumers so nothing speculative ships:
by runtracebacks (done, line-level) — already shipping; migrate it to read the byte map's derived line view when step 2 lands, then delete the standalone line table- byte map + composition + property tests — the core; no API beyond what a consumer needs
- LSP position mapping — go-to-def / hover / diagnostics on
.byvia the generated python; the first consumer that needs columns and both directions - reverse-direction map — when an editor feature consumes
.py→.by
testing¶
- round-trip —
py_to_by(by_to_py(p)) == pfor every copied position - composition —
compose(a, b)equals mapping throughathenb, checked on randomized edit lists - golden — a handful of representative
.byfiles (lazy imports, intersections, hoisted anon-NT classes) with asserted maps at known offsets - traceback integration — the existing
by rune2e test, extended to column assertions once columns land