When to Build a network from tabular data

Build a network from tabular data only when your question is genuinely about relationships between entities — who connects to whom, who brokers between groups, how a community is structured. If you actually want counts, rates or change over time, a chart or map will be clearer and cheaper. The decision hinges on the question, not the data: any two-column table can become a graph, but that does not mean it should.

This guide lays out the signals, trade-offs and costs so you can decide before sinking hours into Gephi.

When does a network add real value?

A network is the right model when at least one of these is true:

Your question contains the words who, between, broker, cluster, bridge or connected.
Relationships are too numerous or tangled to follow in a table or prose.
The structure of connections — not just their count — carries meaning.
You expect indirect effects, where A reaches C only through B.

If none apply, you probably want a different tool.

When is a network the wrong choice?

Resist the graph when:

Your question	Better tool
How many letters per year?	Time-series chart
Where were people born?	Map
Which occupation is most common?	Bar chart / frequency list
Did X increase after Y?	Table or regression
Who connects two factions?	Network

A graph drawn to answer a counting question just hides the answer inside a hairball.

What does my table need to become a graph?

The minimum is two columns naming entities that relate, plus stable identifiers. Consider a witnesses table:

csv

deed_id,witness,parish
D-1701-04,John Aldous,Bungay
D-1701-04,Mary Pratt,Bungay
D-1701-09,John Aldous,Beccles

Two witnesses sharing a deed_id co-occur, so that becomes an undirected edge — a one-mode projection of a deed-witness bipartite structure. In pandas:

python

import itertools, pandas as pd
df = pd.read_csv("witnesses.csv")
edges = []
for _, group in df.groupby("deed_id"):
    for a, b in itertools.combinations(sorted(group["witness"]), 2):
        edges.append((a, b))
pd.DataFrame(edges, columns=["Source","Target"]).to_csv("edges.csv", index=False)

Beware: co-occurrence projections can become very dense — a deed with ten witnesses creates 45 edges.

Why does entity identity matter so much?

The most damaging mistake is building edges before reconciling entities. If "John Aldous", "Jno. Aldous" and "J. Aldous" stay distinct, your network invents three people and splits real connections across them. Always deduplicate first — with OpenRefine clustering or fuzzy matching — and assign one ID per real entity. Network structure built on dirty identity is an artefact of data entry, not history.

Directed or undirected — how do I choose?

Read what the columns mean:

Directed when order matters: sender → recipient, citing → cited, patron → client.
Undirected when the tie is symmetric: co-authorship, co-witnessing, co-membership.

Choosing wrong inflates or hides asymmetry. A correspondence table forced into undirected edges loses the entire distinction between prolific writers and popular recipients.

What are the hidden costs?

Building a network is not free. Budget for:

Reconciliation — usually the largest time sink.
Modelling decisions — directed vs undirected, weighting, projection.
Interpretation risk — graphs invite over-reading; structure can be a survival artefact.
Maintenance — re-running the pipeline when the source table updates.

If the payoff is one figure that a table would convey better, skip it.

What does a good decision look like?

A historian with a deeds collection asks: "Did the same circle of men witness each other's deeds across parishes?" That is a relational, structural question about brokerage between places — a network fits. Another asks: "How many deeds were witnessed per decade?" That is a count — a bar chart wins. Same data, different tools, decided by the question.

Key Takeaways

Build a network only when the question is about relationships and structure.
Use a chart or map for counts, rates, locations and trends.
Any two-column relational table can become a graph; co-occurrence yields a projection.
Reconcile and deduplicate entities before building, or structure becomes an artefact.
Choose directed edges for asymmetric ties, undirected for symmetric ones.
Budget the real cost: reconciliation, modelling, interpretation risk and maintenance.

Frequently Asked Questions

When is a network the wrong model for tabular data?

If your question is about counts, rates or trends over time rather than relationships between entities, a network adds complexity without insight. A table, chart or map answers 'how many' and 'where' far more clearly than a graph.

What columns must my table have to become a network?

At minimum you need two columns that name entities which relate to each other, plus a way to identify each entity uniquely. Anything beyond that, like dates or weights, becomes edge or node attributes.

Can I build a network from a single co-occurrence table?

Yes. If two entities appear in the same row, such as two witnesses on one deed, that co-occurrence is an edge. This produces a one-mode projection, but be aware it can create dense, hard-to-read graphs.

How big does my data need to be to justify a network?

There is no minimum, but networks earn their keep when relationships are too many or too tangled to follow in a table, typically dozens of entities upward. For a handful of links, a sentence or sketch is clearer.

What is the most common mistake when building networks from tables?

Treating a spreadsheet's rows as edges without checking entity identity, so spelling variants become separate nodes. Reconcile and deduplicate entities before building, or your network's structure will be an artefact of inconsistent data entry.

Should I build a directed or undirected network from my table?

Use directed edges when the columns encode an asymmetric relationship, like sender and recipient or citing and cited. Use undirected edges for symmetric ties such as co-authorship or co-presence, where order carries no meaning.

When does a network add real value? ​

When is a network the wrong choice? ​

What does my table need to become a graph? ​

Why does entity identity matter so much? ​

Directed or undirected — how do I choose? ​

What are the hidden costs? ​

What does a good decision look like? ​

Key Takeaways ​

Frequently Asked Questions ​

When is a network the wrong model for tabular data? ​

What columns must my table have to become a network? ​

Can I build a network from a single co-occurrence table? ​

How big does my data need to be to justify a network? ​

What is the most common mistake when building networks from tables? ​

Should I build a directed or undirected network from my table? ​

Related reading ​