Projection Explorer

drag to orbit · scroll to zoom · click to inspect · shift+drag to select · esc to deselect

Choose a dataset to beginTap to inspect · Full interactive controls on desktop

Beyond the Visible

Why This Exists

This tool makes information loss visible. The parallel coordinates panel shows every dimension the projection discards; switch from three principal components to two and watch the explained variance drop from 90.4% to 88.1% while the point cloud barely changes shape. The spatial minimap shows where projected clusters actually sit on the ground. The morphing animation between PCA and t-SNE makes the geometric tradeoff tangible: watch global structure warp as local neighborhoods tighten. And the essay you are reading now is woven into the interface, with links that drive the tool as you read.

The purpose is pedagogical. This is a tool for understanding what projection does to data: what it preserves, what it destroys, and why the choice of projection is never neutral. It was built for an earth science education context, where the student who learns to ask "what are we not seeing?" will carry that question into every dataset, every map, and every mapping they encounter afterward.

The Signature of a Soybean

A single pixel from the Indian Pines scene is not a color. It is 200 numbers: reflectance measurements spanning wavelengths from deep violet to shortwave infrared, most of them invisible to the human eye. Each number records how much light at that wavelength bounced off a 20-meter patch of ground in northwestern Indiana. After removing bands corrupted by atmospheric water absorption, 177 channels remain in the interactive.

A soybean plot, a cornfield, a stand of oaks, an asphalt road, and a gravel rooftop all have unique signatures, reflecting different amounts at different wavelengths. These differences are subtle and spread across all 200 measurements. No single band separates corn from soybeans. The information lives in the pattern across all of them simultaneously.

You cannot look at 200 dimensions. You have to choose which three to show, and that choice destroys everything else. This tool makes that destruction visible, and reversible.

Why You Can't Just Look

Two hundred axes produce 19,900 possible pairs of scatter plots. You could spend a week inspecting them. Or you could ask a more precise question: which single view captures the most structure?

That question is answered though principle components analysis (PCA). Switch to PCA and the tool finds the three directions through 200-dimensional space along which the data varies most. These directions, the principal components, are not wavelength bands. They are weighted combinations of all 200 bands, computed from the covariance structure of the data itself.

The variance metric tells you the cost: if three components capture 90.1% of the variance, you are ignoring 9.9%. Whether that 9.9% matters depends on what you are looking for. If the spectral difference between corn and soybeans lives entirely in that discarded fraction, the projection has erased the distinction you came to find.

The Axes Nobody Drew

PC1 of the Indian Pines scene is a direction through 200-dimensional space: a specific linear combination of all 200 bands, chosen to maximize the spread of the data. For vegetation scenes, it tends to capture overall brightness and the vegetation red-edge contrast. PC2 might separate soil moisture levels. PC3 might isolate mineral absorption features. The interpretation depends on the data.

Try swapping which components map to X, Y, and Z. Watch the points morph smoothly from one arrangement to another. Each arrangement is a different shadow of the same 200-dimensional object. The morphing makes the shadow-casting visible: you are watching the projection hyperplane rotate.

What the Variance Explains

The variance metric reads 90.1% and the projection looks clean: tight clusters, clear separation. The temptation is to stop there.

High explained variance is reassuring but misleading. Two clusters can overlap completely in the top three principal components and separate cleanly in PC7. Variance measures spread, not separation.

Color the points by land cover class. If the colors form tight, well-separated clusters, the projection is preserving the structure you care about. If corn and soybeans overlap in the same diffuse cloud, the distinction exists in dimensions the projection discarded. Filter to four distinct land covers — corn, soybeans, woods, and hay — and watch everything else dim. Recompute PCA on just these classes and the axes shift: the projection now maximizes variance within this subset. The SWIR bands (shortwave infrared, around 2000–2200nm) load heavily onto the first component — cellulose absorption separating crop residue from forest canopy. The red edge (700–750nm) appears on the second component, capturing chlorophyll density differences between vegetation types. Open the parallel coordinates and the full 177-band picture appears. Lasso a cluster in 3D and watch which lines light up: the pattern across all bands that defines that group.

Neighborhoods vs. Distances

Switch to t-SNE and the global geometry warps. Points spread along principal component axes pull into tight, well-separated islands. t-SNE makes a fundamentally different bet: instead of preserving global distances, it preserves local neighborhoods. Points nearby in 200-dimensional space should be nearby in 3D.

Watch the morph. Clusters that were loosely arranged along variance axes snap into compact groups. But the relative distances between clusters, which were meaningful in PCA, are now arbitrary. Two clusters far apart in t-SNE space might be close in the original data. t-SNE tells you what belongs together. PCA tells you what is far apart. Neither view is complete.

With the four classes still filtered, switch to t-SNE and a pre-computed embedding of just these points replaces the full-dataset layout. t-SNE is non-linear and iterative — unlike PCA, it cannot reproject unseen points onto a learned embedding, so the hidden classes disappear rather than dim. The clusters that form reveal neighborhood structure that PCA's global variance axes obscure.

The Signature of a Scientist

Load the knowledge graph and the dimensions change meaning entirely. Here, each dimension is not a wavelength but a relationship. "Connection to Einstein" is an axis. "Connection to Bohr" is another. A hundred and twenty scientists are embedded in 127 dimensions that encode who influenced whom, who collaborated, who shared an institution, a field, or an era.

PCA on this graph finds the axes of maximum relational variance. Color by field and physics, mathematics, biology, chemistry separate. Mostly. But watch for the exceptions. Von Neumann, a mathematician, may cluster closer to the physicists than to the pure mathematicians. If so, the projection has found something the labels missed: his relational fingerprint is more similar to Fermi and Oppenheimer than to Euler or Gauss.

The same question the cornfield raises, reframed.

Reading the Projection

Some visual grammar for reading projected views:

A tight cluster means those points are similar across many dimensions simultaneously. The tighter the cluster, the more redundant their high-dimensional signatures. An outlier has a signature that no other point shares: an unusual mineral, a scientist with a unique relational position.

When two clusters merge during a morph, the distinction between them existed only in the dimensions the old projection showed. When a single cluster splits, the new projection has revealed structure that was hidden.

The parallel coordinates are the reality check. Select a cluster and examine the full-dimensional signatures. If the polylines are similar, the cluster is real. If they vary wildly, the cluster is a projection artifact: points that happen to land in the same place in 3D but differ in the other 197 dimensions. This grammar is transferable to any dimensionality reduction you encounter.

What the Projection Destroys

Open the spatial minimap alongside the 3D view. Two points side by side in PCA space may be kilometers apart on the ground. Spatially adjacent points may land in different projection clusters.

Every projection is a compression. Three dimensions from two hundred means 98.5% of the axes are invisible. The parallel coordinates show what is hidden. The minimap shows the inverse. You can see the gap, measure it, and by swapping axes, partially recover it. But you can never see all of it at once. Three dimensions is all you get.

Compression All the Way Down

Toggle the projection method from PCA to t-SNE and watch the point cloud reorganize. Nothing about the data changed. Everything about what you can see in it did.

RGB is a three-dimensional projection of the visible electromagnetic spectrum. A photograph compresses a 3D scene into 2D. A word compresses an experience into a symbol. Every representation is a projection, a mapping from higher to lower dimensionality that preserves some structure and destroys the rest.

Load the word embeddings and that last compression becomes literal. Each word is a point in 300-dimensional space, its coordinates learned from billions of sentences. Words that appear in similar contexts land near each other. Color by category and watch semantic neighborhoods emerge: animals with animals, emotions with emotions, countries with countries. Filter down to just colors and rotate the projection, a tight cluster disperses and outliers appear. Select them individually and consider why certain color words aren't clustered with the rest. The viewing angle had imposed a similarity the full data did not support. Toggle on t-SNE and the outlier distances are even more pronounced.

The Color-a-Pixel essay asked what happens when you compress color. This tool asks the same question about everything else: spectra, economies, social networks, the geometry of language itself. Something survives. Something is lost. And the choice of what to preserve is never neutral.

Projection Explorer

Loading…

Indian Pines AVIRIS (2,500 × 177)▾

Spatial Distribution

Projection Explorer

Navigate high-dimensional data projected into 3D space. Each point represents a row in the dataset; each dimension is a measured variable. The projection compresses many dimensions into three axes you can see and rotate. An integrated essay (accessible via the Essay button) guides you through the tool with live action triggers that drive the interface as you read.

Navigation

DragOrbit the view (globe-like rotation) ScrollZoom in / out ClickInspect a single point (click empty space to deselect) Shift + DragRectangle selection (lasso) — dims area outside the rectangle EscClear current selection and deselect all points TapInspect a point (touch devices) SelectionCamera orbits to the centroid of selected points. Selection persists across projection changes and panel interactions.

Projection Methods

PCAPrincipal Component Analysis — axes of maximum variance. Choose which 3 components map to X, Y, Z. The variance metric shows how much information the current axes capture. Subset PCAWhen points are selected (via lasso, legend filtering, or minimap brush), the ⟲ Subset PCA button recomputes PCA on just those points. Hidden points collapse to the origin; the projection maximizes variance within the selection. Deselect to return to full-dataset PCA. t-SNEPreserves local neighborhoods. Points nearby in high-dimensional space stay nearby in 3D. Precomputed (nonlinear, non-invertible). A pre-computed subset embedding is available for the essay's 4-class filter. ManualMap any 3 raw dimensions to X, Y, Z. For spectral data, these are literal wavelength bands.

Interface Controls

↻ RotateToggle auto-rotation NoneSet any PCA axis to "None" to remove that component. Dropping the Z axis flattens the projection to 2D and snaps the camera to a front view. Variance updates to reflect only the active axes. ||| CoordsToggle parallel coordinates panel — shows all dimensions simultaneously ⊞ MapOpen spatial mini-map (mobile: fullscreen overlay; desktop: corner panel) ⟲ Subset PCARecompute PCA on the current selection. Appears when 3+ points are selected in PCA mode. SizePoint radius slider

Color Legend

Bottom-left panel. Use the dropdown to choose which metadata field colors the points. Click a legend entry to cycle through three states: select (highlights all points of that class), hide (dims and shrinks those points), and unhide (restores them). When classes are hidden, the Subset PCA button becomes available to recompute the projection on just the visible classes.

Parallel Coordinates

When open, every dimension is a vertical axis. Each point traces a polyline across all axes. Click on an axis to select the nearest point. Drag horizontally to zoom into a band of dimensions. Scroll (or pinch on touch) to zoom centered on cursor. Shift+scroll to pan. Double-click or tap "Reset Zoom" to reset. Selected points from any view highlight their polylines here.

Spatial Mini-Map

Available for datasets with spatial coordinates (x/y metadata). Shows where each point is located on the ground. Hover over a point to highlight it simultaneously in the mini-map, 3D projection, and parallel coordinates. Drag to brush-select a spatial region. On desktop, click ⤢ to expand. On mobile, opens as a fullscreen overlay via the Map button in the View drawer.

Mobile

On smaller screens, controls move into a bottom toolbar with four drawers: View, Color, Axes, and Select. Swipe down to dismiss an open drawer. The parallel coordinates and spatial mini-map open as fullscreen overlays with their own info bar and deselect controls — swipe down or tap ✕ to close. The Select button toggles lasso rectangle mode: drag anywhere on the 3D canvas to select points, then use Deselect to clear. Dismissing the info panel does not clear the selection — use the Deselect button in the toolbar. The essay drawer can be swiped right to close.

Parallel Coordinates

Color by… ▴

—Points

—Dimensions

—Variance

Projection

Components

—▴

Size