So basically, there's three aspects to each color space when it comes to video: The primaries/white point (defining how deep/saturated your color can go), the transfer function and YUV transform coefficients. Not sure if I used 100% the right words here, but the concept is roughly correct.
1. Primaries/white point
This basically defines how saturated the R, G and B components of the colorspace are. In the case of a display for example, that is defined by the chemicals used as dyes or whatever in the pixels. Your image cannot get any more saturated than the individual pixel allows it to be. I don't 100% fully understand the math behind all this myself yet but in a simplified way, you could say that it has to do with the spectral output of the pixel, aka where its peak is and how much "contamination" it gets from other colors. So for example let's say your pixel is green, but it also has contamination/output in the blue/red areas of the spectrum. That basically puts it a step closer towards reaching white, meaning the saturation is decreased. Well and that all is expressed somehow in I believe the XYZ color space. Each primary (red, green or blue) consist of yet another related pair of data related to the XYZ space called x and y. (small x and small y). This is how each color space is basically defined in terms of its gamut. With these parameters you can set up your own color space in Photoshop by going in the Custom RGB... menu.
This image you've probably seen many times:
![[Image: CIExy1931_srgb_gamut.png]](https://upload.wikimedia.org/wikipedia/commons/d/d3/CIExy1931_srgb_gamut.png)
You can see the x-axis is called x and the y-axis fittingly y. I don't know what exactly x and y are but basically, the bigger the triangle between the individual primaries (which are defined by coordinates in this diagram), the more colors you can reproduce and the higher the saturation you can achieve, tho theoretically you could have a gamut with a very saturated green but an undersaturated red etc. The grey area defines the gamut visible by humans I believe.
Edit: Forgot to mention what this has to do with HDR. Basically, Rec2020 defines three primaries on this diagram (x,y coordinates) that encompass a rather big space to allow for highly saturated colors and a lot of range of colors. When playing back on a TV, it will of course get mapped to/limited by whatever the TV's pixels can actually display.
2. Transfer function
Raw light is of course always linear. But for saving in a file, linear would need a lot of bit depth. So what we do is apply a transfer function. For SDR, that is usually a slightly modified version of a gamma curve. For example sRGB has an approximate gamma of 2.2 I think. Basically instead of saving linear values, we apply this transfer function (for example gamma) to save them. Thus we can increase the dynamic range with a lower bit depth.
For HDR, you basically have 3 options afaik: Something normal like gamma (that would really just be SDR, but with the wide gamut), HLG or PQ. PQ is pretty much the standard, used on all 4K HDR Blu Rays and most streaming.
The PQ curve basically assigns a fixed nit brightness value to each code value (actual RGB value). The PQ transfer function is more complicated than a gamma, but its the same concept - a curve that compresses the dynamic range.
Whenever you import something in a color managed application like Photoshop or After Effects and that software knows the color space (through an ICC profile), then it will basically reverse the transfer function again and thus restore the linear data. So you pretty much don't have to worry about any of this yourself for the most part. Whether After Effects unfolds a gamma curve or a PQ curve is all the same to you, except that after unfolding the PQ curve you will have superbright values (above 1.0). So you can then use something like the Exposure effect to bring the highlights back down.
3. YUV transform coefficients
All of this color space stuff basically happens in a form of RGB. YUV is a special thing that we use for video encoding (but also JPEGs and such). The YUV transform coefficients basically define how exactly the RGB gets transformed to YUV. This is basically what you're setting with the matrix="rec2020" or matrix="rec709" or matrix="Rec601" setting. Afaik, these are actually identical for Rec2020 and Rec709, funny enough. But I might be remembering wrong.
In other words, this ONLY matters when you are going from RGB to YUV and back. When you for example convert one kind of YUV (like YV24) to another (like YV12), that doesnt matter because the material is already in YUV and staying in that. Similarly, it doesnt matter for going from one kind of RGB to another.
About the HDR metadata: It's as you say, it helps the TV know what to expect from the material so it can correctly tonemap it to its own range. It doesn't really change the HDR material in and of itself. So the HDR metadata for example wouldn't matter for a fanres. However if you decide to make the final result of your fanres HDR too, then you'd need to run an analysis pass on your rendered HDR content to analyze for MaxCLL (maximum content light level) and MaxFALL (maximum frame average light level), so that it can be played back in the best way possible.
Dolby Vision is a whole other beast from what I've seen, and has way more aspects to it apparently, but people are already working on deciphering it.
Hope that helps. Let me know if anything is unclear.