Appearance
Handle nested historical entities only when the inner span carries information your project actually consumes — chiefly a place you need to geocode or a person you need to attach to a role — and skip it when you are merely counting flat mentions. Nesting roughly doubles annotation effort and is unsupported by most off-the-shelf pipelines, so it is a deliberate trade-off, not a default. This article gives you the signals that say yes, the signals that say no, and a cheap two-layer approximation when you fall in between.
What counts as a nested historical entity?
A nested entity is a tagged span sitting inside another tagged span. Three patterns dominate historical sources:
- Place inside institution:
University of Padua,Bank of England - Place inside role:
the Bishop of Durham,Governor of Bombay - Person inside organisation:
Baring Brothers,the House of Medici
In each case a flat tagger forces a single label and loses one layer. The question is whether that lost layer matters for what you are building.
When is it genuinely worth the cost?
Two situations justify nested handling outright:
- Geocoding from titles. If you map dioceses, universities or colonial offices, the place inside the title is your coordinate source. Collapsing
Bishop of Durhamto a role loses Durham; collapsing it toDurhamloses the office. - Role-based prosopography. If you reconstruct who held which office when, you need both the person and the place-or-body bound into the role.
If neither applies — say you only want a frequency chart of how often any organisation is mentioned per decade — nesting is wasted effort.
Why not just take the deepest spans?
Because the outer span is often the entity that matters. Consider how differently these resolve:
| Surface form | Outer entity | Inner entity | If you keep only inner |
|---|---|---|---|
| University of Padua | ORG (institution) | Padua (PLACE) | event misattributed to a city |
| Bishop of Durham | ROLE/ORG | Durham (PLACE) | the office vanishes |
| Bank of England | ORG | England (PLACE) | a nation, not a bank |
Deepest-only is a silent data-quality bug: your network or map looks complete but points at the wrong nodes.
What does true nested NER cost?
Honest accounting before you commit:
- Annotation: layered or span-based schemes are slower; budget about double the per-document time of flat IOB tagging.
- Tooling: most spaCy and transformers NER heads emit flat spans. True nesting means a span-based architecture or a layered model you maintain yourself.
- Evaluation: a single flat F1 is meaningless; you need per-layer scoring.
How do I approximate nesting cheaply?
Most projects do not need a research-grade nested model. A two-layer pipeline recovers the practical value:
python
def two_layer(text, ner, place_gaz):
outer = ner(text) # flat NER: gets ORG / ROLE spans
enriched = []
for span in outer:
# second pass: find a known place inside the outer span
inner = place_gaz.match_inside(span)
enriched.append({**span, "inner_place": inner})
return enrichedRun flat NER for the outer entity, then a targeted second pass — a place gazetteer or a regex over ... of <Place> title patterns — to recover the inner span. This captures the geocoding and prosopography cases at a fraction of the annotation cost.
How should you evaluate it?
Never report one number. Score the outer layer and the inner layer separately, so a model that recognises institutions but misses the place inside cannot hide behind an average. Report, for example, outer-span F1 of 0.88 and inner-place recall of 0.71 as two distinct figures.
So when should you say no?
Say no when: your output is aggregate counts, your sources rarely embed places in titles, your team cannot fund the extra annotation, or your deadline rules out a custom evaluation. In those cases flat NER plus clear documentation of the limitation is the professional choice.
Key Takeaways
- Nesting is a deliberate trade-off; handle it only when the inner span feeds geocoding or role-based prosopography.
- Deepest-only extraction is a silent bug: it misattributes institutions to bare places.
- True nested NER roughly doubles annotation effort and needs span-based tooling most pipelines lack.
- A two-layer pipeline (flat NER plus a gazetteer or title regex) captures most practical value cheaply.
- Evaluate each layer separately; a single flat F1 hides inner-span failure.
- Say no when you only need flat counts or cannot fund the extra annotation, and document the limitation.
Frequently Asked Questions
What exactly is a nested historical entity?
It is an entity span that contains another tagged entity inside it, such as 'the Bishop of [Durham]' where Durham is a PLACE inside a wider role-or-organisation reference, or '[University of [Padua]]' where a place sits inside an institution name.
When is handling nested entities actually worth it?
When the inner entity carries information you need downstream, typically place-in-title for geocoding or person-in-role for prosopography. If you only count flat mentions for a frequency chart, nesting adds cost without payoff.
Why not just always extract the deepest spans?
Deepest-only loses the outer entity's meaning. 'University of Padua' geocodes and links very differently from bare 'Padua', and collapsing to the inner place silently misattributes events to a city rather than an institution.
What does nested NER cost compared with flat NER?
Nested models are harder to train, need span-based or layered annotation that is slower to produce, and most off-the-shelf pipelines output flat spans only. Budget roughly double the annotation effort and a custom evaluation.
Can I approximate nesting without a true nested model?
Yes. Run flat NER for the outer span, then a second targeted pass (a place gazetteer or a regex over titles) to recover the inner entity. This two-layer pipeline captures most practical value at a fraction of the cost.
How do I evaluate nested extraction fairly?
Score each layer separately rather than with a single flat F1. Report outer-span and inner-span metrics so a model that nails institutions but misses the place inside is not flattered by an average.