Appearance
A Sankey diagram shows how a quantity flows between categories, drawing each flow as a band whose width is proportional to its size. It is the right tool when your story is about movement — migrants between counties, apprentices into trades, cargo between ports — rather than about static totals. The width of every band is the quantity, so the eye reads volume directly.
This guide builds one from a tiny worked example and names the pitfalls beginners hit first.
What is a Sankey diagram, and what is it for?
Think of a river delta. A total enters on the left, splits into branches, and the width of each branch is the volume it carries. A Sankey applies that to data: nodes are categories, links are flows, and link width encodes quantity. Historians reach for them to trace transitions — where a population of apprentices ends up by trade, how a parish's emigrants distribute across destinations, how cargo moves through a chain of ports.
The key intuition: a Sankey answers "how does this total break down and move", not "which category is biggest". For the latter, use a bar chart.
What data do I need to build one?
Almost nothing — a links table with three columns: source, target, value. The tool infers the nodes from the labels.
csv
source,target,value
Yorkshire,London,420
Yorkshire,Lancashire,180
Yorkshire,Overseas,95
Norfolk,London,310
Norfolk,Overseas,60Feed that to SankeyMATIC (paste flows as Yorkshire [420] London) for a no-code chart, or to a script for something reproducible. In Python with Plotly you map labels to integer indices:
python
import plotly.graph_objects as go
labels = ["Yorkshire", "Norfolk", "London", "Lancashire", "Overseas"]
idx = {l: i for i, l in enumerate(labels)}
links = [("Yorkshire","London",420), ("Yorkshire","Lancashire",180),
("Yorkshire","Overseas",95), ("Norfolk","London",310),
("Norfolk","Overseas",60)]
fig = go.Figure(go.Sankey(
node=dict(label=labels),
link=dict(
source=[idx[s] for s, _, _ in links],
target=[idx[t] for _, t, _ in links],
value=[v for _, _, v in links])))
fig.write_html("migration_flows.html")How do I show flows over time?
Make time the columns. Place each period as its own stage and connect a unit's category in one period to its category in the next. Tracking, say, parish populations as "stayed / left / died" across three decades produces an alluvial diagram — a Sankey whose stages are time steps. This is the natural way to show a population reshuffling between states, and it reads as a left-to-right story.
What goes wrong, and how do I avoid it?
| Pitfall | Why it breaks | Fix |
|---|---|---|
| Too many nodes | Bands overlap into a tangle | Keep ~15–20 nodes per stage; aggregate the rest |
| Tiny flows | Invisible thin bands | Merge small categories into an "other" band |
| Cyclic flows | Layout cannot resolve loops | Restructure so flow goes one direction |
| Unsorted nodes | Crossing bands hide the pattern | Order nodes to minimise crossings |
| No totals shown | Reader can't judge scale | Label node totals or add a scale note |
The biggest beginner mistake is throwing the full raw category list at the chart. Aggregate first; a Sankey with eight clear bands beats one with eighty.
When should I not use a Sankey?
If you do not care about movement between categories — only how big each is — a bar chart is faster to read and harder to misread. Sankeys also struggle with negative values, many-to-many flows that loop, and precise reading of exact numbers (band width is approximate to the eye). Pair the diagram with a small table when readers need the figures.
Key Takeaways
- A Sankey encodes flow volume as band width and is for movement between categories, not static comparison.
- The only input you need is a links table: source, target, value.
- Make time the columns to turn a Sankey into an alluvial diagram tracking state changes over periods.
- Keep nodes to roughly 15–20 per stage and aggregate small flows into an "other" band.
- Avoid cyclic flows; Sankey layouts assume movement in one direction.
- Pair the diagram with a table when exact numbers matter — band width reads only approximately.
- SankeyMATIC and Flourish are no-code; Plotly, networkD3 and D3's sankey plugin are reproducible.
Frequently Asked Questions
What is a Sankey diagram?
A Sankey diagram shows flow between categories using bands whose width is proportional to the quantity flowing. It is ideal for tracing how a total splits and recombines — migrants moving between regions, goods between ports, or people between occupations across generations.
When should I use a Sankey instead of a bar chart?
Use a Sankey when you care about the movement between categories, not just totals — when the story is 'where did this go and where did it come from'. If you only need to compare sizes, a bar chart is clearer and easier to read.
What data shape does a Sankey need?
A list of links, each with a source, a target and a numeric value (the flow size). Most tools build the node list automatically from those source and target labels, so a three-column table is usually enough.
Can a Sankey show flows over time?
Yes, by making time itself the columns: place each period as a stage and connect a unit's category in one period to its category in the next. This produces an alluvial diagram, which is a Sankey used to track how a population reshuffles between states over time.
What are the main pitfalls of Sankey diagrams?
Too many nodes produce an unreadable tangle, cyclic flows confuse the layout, and tiny flows vanish. Keep nodes under roughly 15–20 per stage, avoid loops, and aggregate small categories into an 'other' band.
Which tools build Sankey diagrams easily?
SankeyMATIC and Flourish need no code and read a simple flow list. For reproducible work, Python's Plotly and R's networkD3 build Sankeys from a links table, and D3's sankey plugin gives full custom control on the web.