Skip to content
Cultural Analytics

To analyse colour in art collections, first confirm the images share a consistent colour standard, convert each from RGB to perceptually uniform CIELAB, then extract dominant palettes with k-means and aggregate across the collection. The hard part is not the clustering — it is making sure you are measuring the artists' choices and not your scanners' inconsistencies. Crop away frames and mounts before you measure, or they will dominate every result.

Colour analysis quantifies the palettes of paintings, prints and photographs so you can compare them at scale: tracking how an artist's hues shifted, grouping works by colour mood, or charting a movement's palette over decades. The pipeline is short, but its validity rests entirely on the colour fidelity of your digitisation.

Why does colour space matter before anything else?

RGB is how pixels are stored, not how colour is perceived. Two colours equally far apart in RGB can look very different or nearly identical to the eye. For any measurement — distance, clustering, similarity — convert to CIELAB, which is designed so that equal numeric distances roughly match equal perceived differences.

python
from skimage import color
import numpy as np

rgb = image_array / 255.0
lab = color.rgb2lab(rgb)   # L*: lightness, a*: green-red, b*: blue-yellow

The L*a*b* channels also separate lightness from hue, which lets you ask hue questions without brightness confounding them.

How do I extract a painting's dominant palette?

Cluster the pixels. Resize first (analysis does not need full resolution), then run k-means:

python
from sklearn.cluster import KMeans

pixels = lab.reshape(-1, 3)
km = KMeans(n_clusters=5, n_init=10, random_state=42).fit(pixels)
centres = km.cluster_centers_                      # dominant colours in LAB
proportions = np.bincount(km.labels_) / len(pixels)

The five cluster centres are the dominant colours; their proportions tell you how much of the canvas each occupies. Set a fixed random_state so the palette is reproducible.

Why is digitisation consistency the real bottleneck?

This is where collection-scale colour analysis succeeds or fails. The same red pigment can read crimson on one scanner and brick on another depending on lighting, sensor and calibration. Before comparing colour across images, verify the capture standard:

Capture conditionCross-image comparison
ICC profile + colour target per imageReliable
Consistent rig, no profileUsable with caution
Mixed sources, no calibrationMeasures scanners, not art
Web-harvested JPEGsNot comparable

If the collection lacks embedded ICC profiles or capture targets, restrict yourself to within-image statements (this painting's palette) and avoid between-image claims (this artist got warmer over time).

How do I prepare images so I measure the art?

Frames, mounts and backgrounds will hijack your statistics — a white mount alone can become the "dominant colour".

  • Crop to the artwork, removing frame and mount.
  • Mask flat backgrounds if the work is an object photographed on a sheet.
  • Exclude near-pure white and black pixels if they are clearly support, not paint.
python
# Drop near-white mount pixels before clustering (L* near 100)
mask = lab[:, :, 0].reshape(-1) < 95
pixels = pixels[mask]

What can I actually conclude, and how do I aggregate?

Aggregate per-work palettes into collection-level evidence: mean hue per decade, share of warm versus cool pixels, or clustering whole works by their palette vectors to find colour "schools". Plot the trend, then return to the paintings driving it. Colour statistics are descriptive — they show what the palette did, and it remains the historian's job to explain why, drawing on materials, patrons and pigment availability.

Key Takeaways

  • Convert RGB to perceptually uniform CIELAB before any colour measurement.
  • Extract dominant palettes with k-means on the pixels; cluster sizes give proportions.
  • Colour fidelity of digitisation, not the algorithm, determines whether cross-image comparison is valid.
  • Only compare colour across works captured to a consistent, profiled standard.
  • Crop frames, mounts and backgrounds first or they dominate the statistics.
  • Colour analysis is descriptive evidence to interpret, not an explanation by itself.

Frequently Asked Questions

Which colour space should I use for analysis?

Convert from RGB to a perceptually uniform space like CIELAB before measuring distances or clustering. Euclidean distance in RGB does not match how humans perceive colour difference, whereas CIELAB approximates it far better.

How do I extract a painting's dominant colours?

Resize the image, reshape its pixels into a list, and run k-means clustering (e.g. k=5) on them. The cluster centres are the dominant colours and the cluster sizes give each colour's proportion of the canvas.

Why do digitised artworks have inconsistent colour?

Different scanners, lighting and lack of colour calibration mean the same pigment can read differently across a collection. Without an embedded ICC profile or a colour target in the capture, cross-image colour comparison is unreliable.

Can I compare colour across a whole collection meaningfully?

Only if the images were captured to a consistent colour standard, ideally with ICC profiles and colour targets. Otherwise you risk measuring scanner differences rather than artistic choices; always check provenance of the digitisation first.

Should I analyse the full image or just the artwork?

Crop out frames, mounts and background first, or they dominate the colour statistics. A white mount or dark frame will swamp the actual palette and produce misleading dominant colours.

What can colour analysis actually tell art historians?

It can surface palette trends over a period, group works by colour similarity, track an artist's shifting use of hue, or quantify warm-versus-cool balance. It is descriptive evidence to interpret, not an explanation in itself.