Exploring Camera Color Space

Exploring Camera Color Space and Color Correction

Introduction

Do cameras see the same color as us? Can cameras always accurately reproduce colors that our eyes see? This interactive tutorial explores these questions and many more interesting aspects of camera raw color space. In particular, we will walk you through an important concept in both color science and camera signal processing: color correction, the process of correcting the color perception of a camera such that it is as close to ours as allowed. In the end, you will get to appreciate why you should never trust the color produced by your camera and how you might build your own camera that, in theory, out-performs existing cameras in color reproduction.

Caveats. 1) This tutorial demonstrates the principle of color correction with many important, but subtle, engineering details omitted; we will mention them when appropriate. 2) Color correction is one of the two components in camera color reproduction, the other being white balance (or rather, camera's emulation of chromatic adaption of human visual system). We have a post that discusses the principles of chromatic adaptation and its application in white balance. The relationship of color correction and white balance is quite tricky, but Andrew Rowlands has a fascinating article that demystefies it for you.

Step 1: Exploring Camera Color Space

In principle, cameras work just like our eyes. Our retina has three types of cone cells (L, M, and S), each with a unique spectral sensitivity, translating light into three numbers (the L, M, S cone responses) that give us color perception. Similar to the three cone types, (most) cameras use three color filters (commonly referred to as the R, G, and B filters), each with a unique spectral sensitivity and, thus, also translate light into three numbers. In this sense, you can really think of a camera as a "weird" kind of human being with unconventional LMS cone fundamentals. Not all cameras use three filters though. Telescope imaging cameras use five filters, just like butterflies!

The left chart below shows the measured camera sensitivity functions of 48 cameras in two recent studies. The 48 cameras are classified into four categories: DSLR, point and shoot, industrial cameras, and smartphone cameras. The sensitivities are normalized such that the most sensitive filter (usually the green filter) peaks at 1. What should be noted is that the sensitivities functions measured here are not just the spectral transmittances of the color filters; rather, they are measured by treating the camera as a black box, and thus reflect the combined effects of everything in the camera that has a spectral response to light, such as the anti-aliasing filter, IR filter, micro-lenses, the photosites, etc.

The default view shows the average sensitivities across the 48 cameras, but you can also select a particular camera from the drop-down list. As a comparison, we also plot the LMS cone fundamentals in the same chart as dashed lines. As is customary, the LMS cone fundamentals are, each, normalized to peak at 1. As is usually the case in color science, these normalizations merely introduce some scaling factors that will be canceled out later if we care about just the chromaticity of a color (i.e., the relative ratio of the primaries).

You can see that the shapes of the camera sensitivities functions more or less resemble those of the cone fundamentals, which is perhaps not all that surprising. After all, cameras are built to reproduce colors that our eyes see. The shapes, however, are not an exact match. Most notably, you will see that the L and M cone responses overlap much more than the R and the G filters overlap. As you can imagine, sensitivities that are overly close to each other won't provide a great ability to distinguish colors. In the extreme case where two filters' sensitivities exactly match, the camera is dichromatic.

Although by default disabled, the chart also contains the CIE 1931 XYZ Color Matching Functions (CMFs). You can enable them by clicking on the legend to the left of the chart. In fact, clicking on a legend label toggles the visibility of the corresponding curve in the chart. The XYZ CMFs are just one linear transformation away from the LMS cone fundamentals, and is used as the "common language" in colorimetry when comparing different color spaces. So we will use XYZ as the connection color space in the rest of the tutorial.

In the same way we can plot the spectral locus in the XYZ color space, we can also plot the spectral locus on a camera's native RGB color space. The 3D plot on the right shows you the spectral locus on the XYZ, LMS, and the camera's RGB color space. The LMS locus is by default disabled, but you can enable it by clicking its legend label. Drag your mouse and spin around the 3D plot. You can see that the spectral locus in XYZ and in camera's RGB space do not match.

You can also draw your own camera sensitivity functions. Select "Custom" from the drop-down list, and start dragging your mouse. As you draw, the spectral locus in the camera RGB space will be dynamically updated on the 3D plot on the right. You can hit the Draw Reset button to clean the drawing.

Step 2: Problem Formulation

The fact that the locus in XYZ and in the RGB locus do not match means that each spectral light has different XYZ and RGB tristimulus values. That in itself is not a problem: the LMS and the XYZ tristimulus values of spectral lights don't match either. Critically, however, the LMS and XYZ color spaces are just one linear transformation away from each other. Any light, not just spectral light, can be converted between the XYZ and the LMS space using one single $3\times 3$ matrix multiplication. So it really doesn't matter in which space you express the color of a light; whether in LMS or XYZ (or any other colorimetric space), it's all the same underlying color.

Here comes the central question of this tutorial: is a camera's RGB color space also just a linear transformation from the XYZ/LMS space? If so, then the raw camera RGB values of any light can be translated to the correct XYZ values of that light, essentially recovering the color of the light from camera captures. In other words, the camera sees the same colors as us, and metamers to our eyes are also metamers to the camera. In general, if the camera raw color space is precisely a linear transformation away from the XYZ color space, the camera is said to satisfy the Luther Condition and that the camera color space is colorimetric (in that we can use the camera to measure color). What would happen if a camera doesn't satisty the Luther Condition? Well, lights that are different to us (i.e., have different XYZ values) might be the same to the camera (i.e., have the same raw RGB values), and vice versa. Metamers to the camera would not be metameras to our eyes.

Stated mathematically, we want to find a $3\times 3$ transformation matrix $T$ that satifies the following equation:

$ \begin{bmatrix} X_0 & X_1 & X_2 & \dots \\ Y_0 & Y_1 & Y_2 & \dots \\ Z_0 & Z_1 & Z_2 & \dots \end{bmatrix} = \begin{bmatrix} T_{00} & T_{01} & T_{02} \\ T_{10} & T_{11} & T_{12} \\ T_{20} & T_{21} & T_{22} \end{bmatrix} \times \begin{bmatrix} R_0 & R_1 & R_2 & \dots \\ G_0 & G_1 & G_2 & \dots \\ B_0 & B_1 & B_2 & \dots \end{bmatrix}, $

where $[X_i, Y_i, Z_i]^T$ is the XYZ values of a light and $[R_i, G_i, B_i]^T$ is the camera raw RGB values of the same light. Since the transformation matrix $T$ has 9 unknowns but $T$ should ideally work for any arbitrary light, i.e., infinitely many equations, the system of equations is over-determined. Instead of finding an exact solution, we will instead settle for a best-fit $T$ that works well for a set of lights whose color reproduction we care about.

Step 3: The Correction Targets

What are the lights we care to reproduce their colors? We could certainly just use the spectral lights in the chart above, or could even just pick some random lights. But spectral lights are not common in the real world. We ideally want to correct for colors that we normally see in the real world. Here we provide two options, and you can switch between the two from the drop-down list below.

The first one is human skins. We use the human skin reflectance dataset measured and published by Cooksey, Allen, and Tsai at National Institute of Standards and Technology. The original dataset has 100 participants, and we sampled 24 participants to have a more or less even distribution of the skin tone from light to dark skins. The spectral reflectances of the 24 participants are shown in the chart below. Each sampled participant is denoted by "PN", where N is the participant ID in the original dataset. P41 has the lightest skin tone and P43 has the darkest tone. The legend on the left shows the closest sRGB color of the corresponding skin under D65 Standard Illuminant. One could argue that the measurement wasn't diverse enough — we definitely see darker skin tones in real world.

Another option, which is perhaps more commonly used in practical camera color calibration, is what's called the ColorChecker color redition chart, which contains 24 patches whose "spectral reflectances mimic those of natural objects such as human skin, foliage, and flowers." These are perfect target lights for us to calibrate the transformation matrix. While the original manufacturer does not publish the spectral data, people have measured and published the spectral reflectance of these patches. The ones we plot above are from BabelColor, but you might see slightly different versions.

Step 4: Pick Illuminant

You can select an iluminant in the chart above, which shows the SPD of a few CIE Standard Illuminants normalized to peak at unity. You can also draw your own illuminant by selecting "Custom" from the drop-down list.

How do we obtain the XYZ and RGB values of a ColorChecker patch? These patches do not emit lights themselves; they reflect lights. Therefore, we must pick an illuminant for color correction. This inherently suggests that the color correction matrix will be dependent on the illuminant.

Mathmatically, given an illuminant $\Phi(\lambda)$, the XYZ and the camera raw RGB values of a patch $i$ (with a spectral reflectance of $S_i(\lambda)$) are calculated as:

$X_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)X(\lambda)$

$Y_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)Y(\lambda)$

$Z_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)Z(\lambda)$

$R_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)R(\lambda)$

$G_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)G(\lambda)$

$B_i = \sum_{400}^{720}\Phi(\lambda)S_i(\lambda)B(\lambda)$

In practical camera color calibration, instead of applying these equations to calculate the theoretical RGB values, what people usually do is to actually take a photo of the ColorChecker patches and read the raw RGB values. This is necessary because we mostly won't be able to get the exact SPD of the capturing illuminant, and sometimes we don't want to trust the spectral reflectance data. After all, the equations above simulate how RGB values are generated by cameras anyways.

But if we don't have the illuminant SPD, how do we get the XYZ values of the patches? Fortunately, people have calculated the XYZ values of the patches under different CIE Standard Illuminants. We would then have to estimate the illuminant of the capturing scene, and find the closest illuminant for which the XYZ values of the patches are available. Estimating the scene illuminant is a difficult topic on its own, and has implications in white balance as well, which we leave for another tutorial.

Step 5: Calculate Color Correction Matrix

With the illuminant selected, we now have everything to to concretize the linear system:

$\begin{bmatrix} \boxed{??} \\ \boxed{??} \\ \boxed{??} \end{bmatrix}$ $\small{ = \begin{bmatrix} T_{00} & T_{01} & T_{02} \\ T_{10} & T_{11} & T_{12} \\ T_{20} & T_{21} & T_{22} \end{bmatrix} \times}$ $\begin{bmatrix} \boxed{??} \\ \boxed{??} \\ \boxed{??} \end{bmatrix}$

The first matrix has 24 columns, each representing the XYZ values of a ColocChecker patch under your chosen illuminant ${\boxed{??}}$. The last matrix also has 24 columns, each representing the RGB values of a patch in the camera RGB space under the same illuminant. We get an over-determined system with 72 equations and 9 unknowns.

Before solving the system, let's look at the patches in both the XYZ and the camera RGB spaces, which is shown in the left chart below. Evidently, the XYZ and the RGB values of the patches do not overlap, but if you spin around the chart you'll likely see that the relative distributions of the patches within the XYZ and the RGB space are roughly similar, suggesting that a linear transformation might exist.

Solving the Optimization Problem

How to find the best fit $T$? The common approach is to formulate this as a linear least squares problem, which essentially minimizes the sum of the squared differences between the reconstructed XYZ values and the "ground truth" XYZ values. Critically, squared differences represent Euclidean distances in the XYZ space. Note that Euclidean distance in the XYZ space is not proportional to the perceptual color difference. Therefore, it is usually better to convert the color from the XYZ space to a perceptually uniform color space, such as the CIELAB, before formulating the least squares, or using other perceptually-driven color different metrics. For simplicity, however, we will simply use the Euclidean distance in the XYZ space here to show you the idea.

The optimal solution to the linear least squares problem in the form of $A=TB$ is: $T=AB^T (BB^T)^{-1}$, where $A$ is our XYZ matrix and $B$ is the RGB matrix. Now click to find $T$! The best-fit matrix is calculated as:

${\begin{bmatrix} \boxed{??} & \boxed{??} & \boxed{??} \\ \boxed{??} & \boxed{??} & \boxed{??} \\ \boxed{??} & \boxed{??} & \boxed{??} \end{bmatrix}}$ , for camera ${\boxed{??}}$ and illuminant ${\boxed{??}}$.

The transformed/corrected XYZ values of the patches will be shown in the chart above. Look for the label "XYZ (Corrected)". You can see that the correct XYZ values pretty much align with the "ground truth" XYZ values. But they don't exactly overlap, because the transformation matrix is just the best fit, not the perfect solution. To see the color correction error, the heatmap on the right shows difference between the true color and the correct color for the 24 patches using the CIE $\Delta E^*$ metric (in reality, one might want to use the $\Delta E^*$ metric as the loss function for optimization).

This color correction process described so far is pretty much what's done in real cameras, which just have a few additional customary normalizations. For instance, when the RGB values are directly read from the raw captures, they are normalized to the range of $[0, 1]$ using the maxium raw value of the sensor. Also, the transformation matrix $T$ is normalized such that the capturing illuminant (${\boxed{??}}$ here), when normalized to have a Y value of 1, saturates one of the RGB channels (usually the G channel is the first to saturate since green filter has the highest sensitivity as is evident in the first chart). See Section 2.5 of Andrew Rowlands' article for details.

Customizing the Camera

If you change the camera and the illuminant choice, simply click the "Generate Equations" and "Calculate Matrix" buttons again (in that order). As you change the camera and illuminant, you will see that the correction matrix changes too, confirming that the correction matrix is inherently tied to a particular camera and calibrating illuminant. Here are a few cool things you are invited to try:

Draw camera sensitivity functions that mimic the LMS cone fundamentals or the XYZ CMFs. You will see that the color reproduction error will be relatively small.
Make each camera sensitivity function narrow and have little overlap across the spectrum.
Make each camera sensitivity function narrow and have significant overlap.
Make each camera sensitivity function wide and overlap across the spectrum.

Step 6: Interpreting Camera Color Space

With the transformation matrix $T$ calculated, we can use it to get more insights into a particular camera's color space. The TL;DR is that a typical camera's color space contains imaginary colors. While the observations we are about to make are generally true, but let's take one specific camera just so we can be concrete. Click the button to choose iPhone 12 Pro Max as the target camera and D65 as the illuminant. Everything on this page will be updated accordingly.

Differences Between Skin Tones

If we use the human skin tones as the correction target, we'll see that darker skins generally have worse color correction results compared to lighter skins, as it evident in the heatmap above, where bottom rows represent darker skin tones and have higher $\Delta E^*$ errors. Seemingly neutral correction algorithm, as is implemented in this tutorial, results in racial bias manifested as different color reproduction accuracy measures across skin tones. There is a great talk by Theodore Kim on how graphics and camera imaging research could, independent of any individual intent, contain racial biases.

Note that this is not to say that the color correction algorithm used in a particular camera will always produce the exact bias as bad as discussed above. Our intention is to provide a concrete example where camera imaging could inadvertently introduce racial bias.

Are Spectral Lights Corrected?

The first question you can ask is, does $T$ correct colors of other lights that are not used for color correction? In particular, we are interested in how the spectral lights are corrected. Go back to the spectral locus plot in Step 1 and click the "Draw Corrected Locus" button, which will plot the transformed spectral locus (the "XYZ (Corrected)" curve). Ideally, you want that locus to precisely match the actual XYZ spectral locus, but most likely you will see that that are way off. This is perhaps not all that surprising, because the spectral lights are not used for calculating the best-fit matrix.

We can gain a better understanding of the correction results in the chromaticity plot. The chart below plots the "ground truth" spectral locus and the 24 patches in the xy-chromaticity plot, and then overlays the reconstructed locus and patches transformed from the camera RGB space using matrix $T$. Hover on any point will show the coupled counterpart as well. Notice how the original 24 patches and the reconstructed ones are almost overlapped (because they are used to estimate $T$), but the true and corrected spectral locus are miles apart. Those reconstructed spectral colors, when converted to an output-referred color space, will give you incorrect colors.

What's a Camera's Gamut Like?

Recall two things here. First, gamut refers to all the colors that can be produced by a color space by mixing (non-negative amount of) the primaries of that color space. Second, the area enclosed by the spectral locus represents all the colors that huiman visual system can see.

If you click the "sRGB gamut" legend label in the plot above, you will see the gamut of the sRGB color space. It's a triangle inside the spectral locus. The three vertices of the triangle represent the three primaries of the sRGB color space. Clearly, the sRGB priamries are real colors and all the colors produced by the sRGB color space are real colors, too, but there are some real colors that can't produced by sRGB. An interesting observation is that all but one ColorChecker patches lie inside the sRGB gamut. That is intentional. sRGB is the most commonly used color space, and you would want to be able to reproduce its gamut well. By choosing colors within the sRGB gamut as the correction target, it's likely that other colors in the gamut can be reasonably reconstructed as well.

Now click the "Camera RGB gamut" legend label in the plot above. You will see the gamut of ${\boxed{??}}$'s color space, which is still a triangle, because we are still mixing the RGB primaries through a linear combination. This triangle is obtained simply by multiplying $T$ with $[[0, 0, 1], [0, 1, 0], [1, 0, 0]]$ (assuming the raw values are normalized to the range of $[0, 1]$) and converting the results to chromaticities. What this means is that if you manually set the RAW pixel values (e.g., through a raw image editor) and go through the color correction pipeline you will end up at a point within that triangle.

You might be wondering, where do real colors reside then? Are they in the area encloused by the corrected/reconstructed spectral locus? Not really. The reason is that the corrected spectral locus no longer has our familiar horseshoe shape and, more importantly, is no longer convex. So, for instance, if you connect spectral lights $520~nm$ and $550~nm$ from the corrected locus, any point on that line represents the color of a real light mixed from $520~nm$ and $550~nm$. If you use iPhone 12 Pro Max to take a photo of a light mixed from $520~nm$ and $550~nm$ and transform the raw RGB values using matrix $T$, you will end up on that line in the xy-chromaticity plot. But that line lies outside the locus. So we can no longer say that real colors lie inside the locus.

In fact, it is the convex hull of the corrected spectral locus that contains the colors of all the real lights. Click the "Convex Hull of Spectral Locus in Camera" legend label to reveal the convex hull. Any point that's outside the convex hull but inside the camera gamut will have valid (i.e., positive) raw RGB values, but they will never show up in a real ${\boxed{??}}$ capture because they represent imaginary colors.