Use the data
Start with a small sample
Not ready to run the full pipeline? Begin with a starter bright-star query and build something visible: a chart, classroom prompt, notebook, or simple 3D scene.
Use the starter guide →Build with the data
Start with a small bright-star query, run the open pipeline when you are ready, or read the technical notes that explain how Gaia and Hipparcos become a browser-ready 3D star map. The project code is open: the data pipeline, spatial index, and viewer can all be inspected and built on.
Use the data
Not ready to run the full pipeline? Begin with a starter bright-star query and build something visible: a chart, classroom prompt, notebook, or simple 3D scene.
Use the starter guide →Run the pipeline
Install the open tools, download source catalogues, merge Gaia with Hipparcos, and produce the Parquet outputs used by the site.
Follow the guide →Understand the machinery
See how source catalogues are merged, how distances are chosen, and why a spatial index lets a browser explore a dataset far too large to download at once.
Start with the catalogue merge →Step one
Gaia DR3 gives Found in Space its foundation: a vast modern catalogue with positions, brightnesses, parallaxes, proper motions, and colours for an unprecedented number of sources. But a public star map also needs the familiar bright stars people recognise by eye.
The very brightest objects can require special handling in Gaia, so Found in Space combines Gaia with Hipparcos and curated overrides to keep the naked-eye sky usable, recognisable, and well documented.
About 100,000 stars appear in both catalogues. The pipeline produces one canonical row per object wherever possible, choosing the best available measurement, handling duplicate catalogue entries, and documenting special cases such as the Sun.
How the merge works →Step two
Each star gets a working distance estimate: from catalogue parallax where reliable, from Bailer-Jones probabilistic estimates where not, from photometric modelling where those fail, and from a conservative prior as a last resort. Each tier is flagged.
Positions are propagated to a common epoch (J2016.0) using proper motions, and converted to Sun-centred Cartesian coordinates in parsecs. Temperature comes from spectroscopic measurements where available, falling back through colour cascades to a default.
The result is a HEALPix-partitioned Parquet table: one row per star, a fixed schema, quality flags packed into each record.
Step three
A browser cannot download a billion-row table. The merged data is encoded into a spatial octree — a tree structure that divides 3D space into nested cells across fourteen levels. Each star is placed into a level based on its brightness: the brightest stars sit in the shallowest, largest cells, while fainter stars go into progressively deeper, smaller ones. The threshold at each level is derived from an apparent-magnitude limit, by default roughly the naked-eye limit.
When you navigate the viewer, it loads cells level by level. Bright-star cells have large visibility radii and load from anywhere in the scene. Faint-star cells only load when you fly close enough that those stars would actually be visible. Everything else stays on the server until you need it.
Technical notes
Why two surveys are needed, how 100,000 duplicate stars are deduplicated, and how the pipeline handles cases that defeat automated matching.
Read →The pipeline is open source. Create a 16 MB sample of the brightest stars, or build the full dataset from scratch using the Gaia Archive.
Read →Start with the SkyKit builder path.
Start with the SkyKit builder path →Open source
The data pipeline code — catalogues, astrometry, photometry, merging, and the
project configuration format — is published under the MIT licence at
github.com/Found-in-Space/pipeline.
Requires Python ≥ 3.13 and uv.