Skip to content
TEI & XML Encoding

To mark up names and places in TEI, wrap people in persName, places in placeName, and organisations in orgName, then link each mention to a canonical record with the @ref attribute. Declare those records once — people in a listPerson, places in a listPlace — each with an xml:id. Every mention then resolves to one entity, so spelling variants collapse and you can index, count, and network-analyse reliably.

The single most valuable habit is pointing names at authority records rather than just tagging the text. Tagging tells you "this is a person"; the @ref tells you which person.

Which elements do I use?

TEI offers a small, clear set. Pick by what the token refers to:

ElementForExample
persNameA person<persName>Lady Macbeth</persName>
placeNameA geographic place<placeName>Edinburgh</placeName>
orgNameAn organisation<orgName>Royal Society</orgName>
nameUnspecified type<name>the Crown</name>

Use the specific element whenever you can; reserve generic name for cases where the type is genuinely unclear.

This is the step that turns markup into data. First, declare each entity once with an xml:id:

xml
<listPerson>
  <person xml:id="JW01">
    <persName>Jane Welsh Carlyle</persName>
    <birth when="1801"/><death when="1866"/>
  </person>
</listPerson>
<listPlace>
  <place xml:id="EDIN">
    <placeName>Edinburgh</placeName>
    <location><geo>55.95 -3.19</geo></location>
  </place>
</listPlace>

Then point every mention at the record, however it is spelled in the source:

xml
<p><persName ref="#JW01">Jane</persName> wrote from
   <placeName ref="#EDIN">Auld Reekie</placeName> in haste.</p>

Both "Jane" and a later "Mrs Carlyle" resolve to #JW01; "Auld Reekie" and "Edinburgh" both resolve to #EDIN. The source keeps its original wording while the data stays unified.

How do I connect to external authorities?

For interoperability, point @ref at a stable external URI instead of (or alongside) a local id. This makes your entities Linked Open Data:

xml
<persName ref="http://viaf.org/viaf/95207071">Carlyle</persName>
<placeName ref="http://sws.geonames.org/2650225/">Edinburgh</placeName>

VIAF for people, GeoNames or the Getty TGN for places, and Wikidata for either. A name reconciled to a Wikidata QID can be cross-walked to many other identifiers automatically, which is why reconciliation is worth the effort.

Should I tag every occurrence?

Yes, if you care about analysis. The trade-off is real:

  • Tag every mention → accurate frequency counts, co-occurrence networks, complete indexes. More work.
  • Tag first mention only → lighter, fine for a reading edition, but it silently breaks any counting or network extraction because most occurrences are invisible to your tools.

For a research edition, tag exhaustively. A regex-assisted first pass in your editor, followed by manual disambiguation, makes this tractable for a long text.

What are the common pitfalls?

  • Conflating distinct people. Two "John Smith"s need two records. Resolve before you assign @ref, not after.
  • Tagging by spelling instead of reference. "Washington" can be a person or a city. Tag what it refers to in that sentence, not the string.
  • Over-nesting name parts. You can break a persName into forename, surname, roleName — but only do so where the granularity is used. Needless nesting slows encoding with no payoff.
  • Forgetting to declare the record. A ref="#X01" that resolves to nothing is a dangling pointer; add a Schematron rule to catch unresolved references.
  • Inconsistent id schemes. Decide a convention (#per_carlyle_jane) early and stick to it across the whole project.

How do I check my entity markup is consistent?

Run a Schematron rule asserting every @ref starting with # resolves to a declared xml:id, and periodically extract all distinct persName/placeName values to spot variants you missed. An XPath like //tei:persName[@ref] counted and grouped by @ref quickly reveals whether one person is fragmented across several ids — the most common data-quality flaw in entity encoding.

Key Takeaways

  • Use persName, placeName, and orgName; fall back to generic name only when unsure.
  • Declare each entity once in listPerson/listPlace with an xml:id.
  • Link every mention with @ref so spelling variants collapse onto one record.
  • Point @ref at VIAF, GeoNames, the Getty TGN, or Wikidata for Linked Open Data.
  • Tag by what the token refers to in context, not by its spelling.
  • Tag every occurrence for analysis; first-mention-only breaks counts and networks.
  • Validate references with Schematron to catch dangling and fragmented entities.

Frequently Asked Questions

Which TEI elements mark up people and places?

Use persName for people, placeName for places, orgName for organisations, and the generic name element when you do not want to specify a type. Each can carry a ref attribute pointing to an authority record.

Give each person an entry in a listPerson with an xml:id, then point every mention to it with ref="#id". This collapses all spelling variants of one person onto a single record you can resolve and count.

Should I tag every occurrence of a name or just the first?

Tag every occurrence if you want accurate analysis, indexing, or network extraction. Tagging only the first mention is acceptable for a light reading edition but breaks frequency counts and co-occurrence analysis.

How do I handle a name that is also a place, like a surname from a town?

Tag by function in context: if the token refers to the person, use persName; if it refers to the location, use placeName. The same string can be different entities in different sentences, and the markup should reflect the reference, not the spelling.

Can I point names to external authorities like VIAF or GeoNames?

Yes. Put the authority URI in the ref attribute, for example ref="http://viaf.org/viaf/12345" for a person or a GeoNames URI for a place. This makes your entities interoperable and linkable as Linked Open Data.