Beginner's Guide to Communities in historical networks

Community detection automatically finds groups of nodes that connect to each other more densely than to the rest of the network — in historical terms, the circles, factions or households that the records never label. You run an algorithm such as Louvain, it assigns every node a community number, and you colour the graph by that number to reveal structure. The crucial caveat for beginners: the algorithm finds dense connection patterns, not real historical groups — proving the two coincide is your job.

This is a gentle introduction with a worked example you can follow start to finish.

What exactly is a community?

A community is a set of nodes with many ties inside the set and few ties leaving it. Think of three letter-writing circles that mostly correspond within themselves and only occasionally across. Community detection partitions the network so each node belongs to one such cluster, exposing structure that is invisible in a list of letters.

How does the algorithm decide? (modularity in plain terms)

Most methods optimise modularity — a single number measuring how much denser the within-group links are than you'd expect by chance. Roughly:

Modularity near 0 — no more clustering than random.
Modularity around 0.3–0.7 — clear, meaningful community structure.
Modularity near 1 — extremely clean separation (rare in real data).

A high score is encouraging but never sufficient proof; random graphs can score moderately, and a tidy partition can still be historically meaningless.

Which algorithm should I start with?

Begin with Louvain — it is fast, parameter-free by default, and built into Gephi's Statistics panel as "Modularity". For Python users, Leiden is a refined successor that avoids a known Louvain flaw (occasionally producing internally disconnected communities).

Algorithm	Where	Note
Louvain	Gephi, Python	Fast, the default starting point
Leiden	Python (`leidenalg`)	Fixes Louvain's connectivity quirk
Girvan-Newman	Small graphs	Intuitive but slow; edge-betweenness based

A small worked example

Suppose you have an undirected friendship graph. In Python with NetworkX:

python

import networkx as nx
from networkx.algorithms.community import louvain_communities

G = nx.karate_club_graph()           # a classic 34-node test network
communities = louvain_communities(G, seed=42)
print(len(communities), "communities")
for i, c in enumerate(communities):
    print(i, sorted(c))

The seed=42 fixes the randomness so your result is reproducible. To colour the graph by community for a figure, assign each node its group index and map it to colour in Gephi or matplotlib.

Why do my results change each run?

Louvain is stochastic: it processes nodes in a partly random order, so two runs can differ slightly. Two habits keep you honest:

Fix a seed for a reproducible figure.
Run it several times without a seed to see which groupings persist — the stable core is what you trust; nodes that flip between communities are genuinely on the boundary.

How do I get more or fewer groups?

Use the resolution parameter. Higher resolution splits the network into more, smaller communities; lower resolution merges them into fewer, larger ones. Match it to your question's scale: studying households needs higher resolution than studying rival regional factions.

python

communities = louvain_communities(G, resolution=1.5, seed=42)

There is no single "correct" resolution — there is only the scale that answers your question.

Does a community mean a real historical group?

No, and this is where beginners stumble. The algorithm guarantees only that the cluster is densely connected. To claim a community is a real faction, household or institution, you must test it against the sources: do its members share a place, a cause, a kinship line? If a cluster has no external corroboration, report it as a structural pattern, not a historical fact.

Key Takeaways

Communities are groups with dense internal ties and sparse external ones.
Modularity scores the quality of a partition but never proves historical reality.
Start with Louvain; use Leiden in Python to avoid its connectivity quirk.
Louvain is stochastic — fix a seed for figures and re-run to check stability.
The resolution parameter controls how many communities you get; match it to your question.
A detected community must be corroborated against sources before you call it a real group.

Frequently Asked Questions

What is community detection in a network?

Community detection is the automatic grouping of nodes that are more densely connected to each other than to the rest of the network. In historical terms, it can surface circles, factions or institutions that the records do not name explicitly.

What is modularity and why does it matter?

Modularity is a score between roughly -0.5 and 1 measuring how well a partition separates dense groups from sparse links between them. Higher modularity means cleaner communities, but a high score alone does not prove the groups are historically real.

Which algorithm should a beginner use?

The Louvain method is the standard starting point because it is fast, built into Gephi, and gives good results with no tuning. The Leiden algorithm is a refined successor that avoids some of Louvain's quirks if you work in Python.

Why do I get different communities each time I run it?

Louvain is stochastic, so node order and randomness shift the result slightly between runs. Set a fixed random seed for reproducibility and run it several times to check that the major groupings are stable.

Does a detected community correspond to a real historical group?

Not automatically. The algorithm only finds dense connection patterns; whether a cluster maps to a real faction, family or institution is a question for the historian to test against the sources, not assume.

Can I control how many communities I get?

Yes, indirectly, through a resolution parameter. Raising resolution yields more, smaller communities; lowering it yields fewer, larger ones. Adjust it to the scale of grouping your historical question is about.

What exactly is a community? ​

How does the algorithm decide? (modularity in plain terms) ​

Which algorithm should I start with? ​

A small worked example ​

Why do my results change each run? ​

How do I get more or fewer groups? ​

Does a community mean a real historical group? ​

Key Takeaways ​

Frequently Asked Questions ​

What is community detection in a network? ​

What is modularity and why does it matter? ​

Which algorithm should a beginner use? ​

Why do I get different communities each time I run it? ​

Does a detected community correspond to a real historical group? ​

Can I control how many communities I get? ​

Related reading ​