Beginner's Guide to Wikidata lexemes for historical words

Q: What is a Wikidata lexeme?

A lexeme is a Wikidata entity (prefixed L instead of Q) that represents a word or phrase in a specific language, together with its grammatical forms and senses. It models language itself, separate from the items that represent things in the world.

A Wikidata lexeme is a structured entry for a single word in one language — its spelling, its grammatical forms and its meanings — and it is the natural place to model historical words because it can hold archaic spellings, obsolete senses and dated attestations side by side. Lexemes use the prefix L (for example L1347) instead of the Q you know from items, and they live in a parallel namespace built specifically for language data. This guide explains the three parts of a lexeme and walks through a small worked example.

What are the three parts of a lexeme?

Every lexeme has the same anatomy:

The lexeme itself — the lemma (headword), its language, and its lexical category (noun, verb, etc.). Example: the lemma eyren (a historical English plural of "egg"), language Middle English, category noun.
Forms — the inflected or variant written shapes, each with grammatical features. A noun might have singular and plural forms; a historical word might have several attested spellings.
Senses — the meanings, each able to link out to the Q-item it denotes via P5137 (item for this sense).

So a lexeme separates the word from the thing the word names — exactly the distinction historical lexicography needs.

Why are lexemes a good fit for historical words?

Historical vocabulary is messy: spelling was unstable, meanings shifted, words died. Lexemes absorb that messiness cleanly:

Multiple forms capture spelling variation (e.g. colour / colur / coloure).
Qualifiers on statements can record the period or source of a usage.
A sense can point to a modern concept even when the word is obsolete.
Everything is queryable with SPARQL, so you can study variation at scale.

How do I create a lexeme? (a small worked example)

Say we want a lexeme for the archaic word thou. In the lexeme creation form you supply:

text

Lemma:            thou
Language:         English (Q1860)   ← or a historical variety if modelled
Lexical category: pronoun (Q36224)

Then add forms — for example the objective form thee and the possessive thy — each tagged with grammatical features. Finally add a sense and link it where it denotes a concept. The created entity gets an L-number you can cite and query.

How do I query historical lexemes with SPARQL?

Lexemes are first-class in the Wikidata Query Service. This finds noun lexemes in a given language with more than one recorded form (a proxy for spelling variation):

sparql

SELECT ?lexeme ?lemma (COUNT(?form) AS ?forms) WHERE {
  ?lexeme dct:language wd:Q1860 ;        # English
          wikibase:lexicalCategory wd:Q1084 ;  # noun
          wikibase:lemma ?lemma ;
          ontolex:lexicalForm ?form .
}
GROUP BY ?lexeme ?lemma
HAVING (?forms > 1)
ORDER BY DESC(?forms)
LIMIT 20

Swap the language and category QIDs to retarget the query at your corpus.

How do lexemes compare to plain items for words?

Aspect	Wikidata item (Q)	Wikidata lexeme (L)
Represents	a concept or thing	a word in a language
Spelling variants	awkward (extra statements)	native (multiple forms)
Grammar	not modelled	forms carry features
Links to meaning	is the meaning	sense → item via `P5137`
Best for	"the egg" the object	"egg / eyren" the words

What are good beginner habits?

Always set the right language and lexical category first; they are hard to change later.
Record sources for historical claims — attestation matters more here than anywhere.
Prefer separate forms for genuinely different spellings rather than cramming them into one lemma.
Link senses to Q-items so your lexical data joins the wider knowledge graph.

Key Takeaways

A lexeme (L-prefix) models a word in one language: lemma, forms, senses.
Lexemes are purpose-built for historical words — variant spellings and obsolete senses fit naturally.
The sense → item link (P5137) connects words to the concepts they name.
Lexemes are fully queryable in SPARQL via wikibase:lemma and ontolex:lexicalForm.
Set language and lexical category correctly up front; they are awkward to change.
Source historical claims; attestation is the whole point.
For shareable, linkable lexical data lexemes excel; a full critical dictionary may still need more.

Frequently Asked Questions

What is a Wikidata lexeme?

A lexeme is a Wikidata entity (prefixed L instead of Q) that represents a word or phrase in a specific language, together with its grammatical forms and senses. It models language itself, separate from the items that represent things in the world.

How is a lexeme different from a normal Wikidata item?

A Q-item represents a concept or thing (a person, a place, the idea of a cat). A lexeme represents a word in a language. The two link: the senses of a lexeme can point to the Q-item they denote, but the lexeme captures spelling, grammar and meaning of the word.

Can lexemes record obsolete or archaic spellings?

Yes. Each lexeme can hold multiple forms, and you can record historical or variant spellings as separate forms or with qualifiers. This makes lexemes well suited to capturing how a word was actually written across periods.

Do I need permission or special rights to create lexemes?

No special rights are needed beyond a normal Wikidata account, though lexeme editing has its own interface. As with all Wikidata, statements should be sourced, especially for historical claims about usage or attestation.

What can I do with historical-word lexemes once created?

You can query them with SPARQL to study spelling variation, link senses to concepts for semantic search, support normalisation of historical text, and connect dictionaries or glossaries to a shared, linkable language graph.

Are lexemes the right tool for a full historical dictionary?

They can underpin one, but for a critical scholarly dictionary you may still need a dedicated editorial platform. Lexemes shine for shareable, linkable, machine-queryable lexical data rather than long discursive entries.

What are the three parts of a lexeme? ​

Why are lexemes a good fit for historical words? ​

How do I create a lexeme? (a small worked example) ​

How do I query historical lexemes with SPARQL? ​

How do lexemes compare to plain items for words? ​

What are good beginner habits? ​

Key Takeaways ​

Frequently Asked Questions ​

What is a Wikidata lexeme? ​

How is a lexeme different from a normal Wikidata item? ​

Can lexemes record obsolete or archaic spellings? ​

Do I need permission or special rights to create lexemes? ​

What can I do with historical-word lexemes once created? ​

Are lexemes the right tool for a full historical dictionary? ​

Related reading ​