When to Apply PCA to spectral images

Apply PCA to a spectral image stack when a faint feature — erased text, a corroded pigment, a watermark — spreads across several bands and you want one image that concentrates it. PCA shines as a fast, deterministic first pass that pulls structured signal out of dominant brightness. Skip it when a single band ratio already gives the contrast, when you need physical material identification (PCA components are unitless), or when the target signal is statistically independent rather than maximally variant, in which case ICA is the better tool. Knowing which situation you are in saves hours.

What is PCA actually doing to my bands?

PCA treats each pixel as a point in B-dimensional space (one axis per band) and finds new axes — the principal components — ordered so the first captures the most variance, the second the most of what remains, and so on. For manuscripts, overall page brightness is the biggest source of variance, so it lands in PC1, while spatially structured faint text contributes less variance and surfaces in PC2 or PC3. You are not adding information; you are re-projecting it so the interesting part is easier to see.

When is PCA the right call?

Reach for PCA when:

The feature you want appears weakly in several bands rather than strongly in one.
You need a quick, repeatable scouting pass over many folios.
You want a deterministic result (PCA is not random, unlike ICA's component order).
Noise is moderate and you mainly need to separate signal from brightness.

Look elsewhere when one of those assumptions breaks.

When should I reach for something else instead?

Situation	Better tool	Reason
One band pair has all the contrast	Band ratio	Transparent, easy to justify, no rotation
Need to identify a pigment	Reflectance spectra	PCA components carry no physical meaning
Want independent layers separated	ICA	Maximises independence, not variance
Very noisy stack	MNF	Orders components by signal-to-noise
Few bands (3–4)	Direct inspection	Too little to rotate usefully

How do I run PCA correctly on a manuscript stack?

Standardise the bands first so a longer-exposed band does not dominate purely on scale, then keep and inspect the leading components:

python

import numpy as np
from skimage import io, exposure
from sklearn.decomposition import PCA

cube = np.stack([io.imread(f).astype(np.float32) for f in band_files], -1)
H, W, B = cube.shape
X = cube.reshape(-1, B)
X = (X - X.mean(0)) / (X.std(0) + 1e-6)          # correlation PCA

scores = PCA(n_components=B).fit_transform(X).reshape(H, W, B)
for i in range(min(B, 6)):                        # eyeball the first six
    comp = exposure.rescale_intensity(scores[..., i], out_range=(0, 1))
    io.imsave(f"pc{i+1}.png", (comp * 255).astype("uint8"))

Standardising turns this into correlation-based PCA, which is the right default when bands have very different brightness.

What are the trade-offs and hidden costs?

PCA is cheap to run but easy to over-trust. Its components are abstract linear combinations with no physical units, so you cannot read a pigment identity off them. Aggressive stretching of a noisy late component can dress up noise as text, so every reveal must be checked against the raw bands. And because the rotation is data-driven, two folios produce two different component meanings — PC2 here is not PC2 there — which complicates batch interpretation.

How do I avoid fooling myself with a PCA result?

Treat a striking PCA image as a hypothesis, not a finding. Trace each apparent stroke back to the raw bands and ask which physical band carries it; if no single band supports it, be suspicious. Keep contrast stretches modest, record the exact PCA parameters as paradata, and have a second reader confirm the text without seeing your enhanced version first. Reproducibility is what separates a recovered reading from a Rorschach blot.

Key Takeaways

Use PCA when faint signal spans several bands and you want a fast, deterministic scout.
Standardise bands first (correlation PCA) so exposure differences don't dominate.
Faint text usually appears in PC2/PC3; PC1 is overall brightness — inspect all early components.
Prefer band ratios for transparency, ICA for independent layers, MNF for noisy stacks.
PCA components are unitless and cannot identify materials.
Validate every reveal against the raw bands and document parameters to stay defensible.

Frequently Asked Questions

What does PCA do to a spectral image stack?

PCA rotates the band axes so the first components capture the most variance in the data. Faint features that span several bands often concentrate into one low component, separating them from dominant brightness and noise.

When should I NOT use PCA?

Avoid PCA when a simple band ratio already gives the contrast, when you need to identify materials (PCA components have no physical units), or when the signal you want is independent rather than maximally variant — then ICA fits better.

Why is the faint text usually in PC2 or PC3, not PC1?

PC1 captures overall page brightness, which dominates the variance. The faint, spatially structured undertext contributes less total variance, so it surfaces in the second or third component.

Should I standardise the bands before PCA?

Usually yes for manuscript work: standardising each band to zero mean and unit variance stops bright bands from dominating purely because they were exposed longer. This is correlation-based PCA.

Does PCA invent text that is not there?

PCA only recombines existing band data, but aggressive contrast stretching of a noisy component can make noise look text-like. Always validate a PCA reveal against the underlying raw bands.

How many components should I keep?

Inspect them all up to about the number of bands, but the useful signal almost always lies in the first four to six. Later components are typically noise, though occasionally a faint layer hides there.

What is PCA actually doing to my bands? ​

When is PCA the right call? ​

When should I reach for something else instead? ​

How do I run PCA correctly on a manuscript stack? ​

What are the trade-offs and hidden costs? ​

How do I avoid fooling myself with a PCA result? ​

Key Takeaways ​

Frequently Asked Questions ​

What does PCA do to a spectral image stack? ​

When should I NOT use PCA? ​

Why is the faint text usually in PC2 or PC3, not PC1? ​

Should I standardise the bands before PCA? ​

Does PCA invent text that is not there? ​

How many components should I keep? ​

Related reading ​