These kind of problems are very common in computer graphics. What I will explain only works if the distance you want to measure and your reference object are in the same plane (for example a sheet of paper on the table you want to measure).
If you knew how this plane lies in the image, that is how the points of the of the plane are mapped to points in the image, then it would be easy. Fortunately this task already has a good answer here! Let's apply this to measuring the distance of two points in the image. Take this screenshot out of your video as an example:
![the table](https://cdn.statically.io/img/i.sstatic.net/3P3is.png)
The corners of the sheet of paper are $p_1 = (496, 255)$, $p_2 = (607, 224)$, $p_3 = (508, 171)$, and $p_4 = (405, 194)$ and the corners of the table are $q_1 = (7,244)$, $q_2 = (654, 389)$, $q_3 = (860, 47)$, and $q_4 = (511, 23)$ (all in pixel). The corners of a DIN A3 paper are $e_1 = (0, 0)$, $e_2 = (0.3, 0)$, $e_3 = (0.3, 0.42)$, and $e_4 = (0, 0.42)$ (all in m).
Now construct the transformation matrices according to the reference. For us the "destination image" is the sheet of paper, and the "source image" is the given one of the table. The transformation matrices are:
$$A=\left(
\begin{array}{ccc}
-400.027 & 530.049 & 377.977 \\
-134.686 & 172.899 & 212.787 \\
-0.806505 & 0.873228 & 0.933278 \\
\end{array}
\right),$$
$$B=\left(
\begin{array}{ccc}
0. & 0.3 & 0. \\
0. & 0. & 0.42 \\
-1. & 1. & 1. \\
\end{array}
\right),$$
$$\text{and } C=B A^{-1}=\left(
\begin{array}{ccc}
0.00218481 & 0.00325931 & -1.62797 \\
-0.00145442 & 0.00520777 & -0.148304 \\
-0.0000581739 & -0.00284785 & 1.74436 \\
\end{array}
\right).$$
The matrix C transforms from pixel coordinates to "sheet of paper" coordinates. Pay attention to the fact that we are using homogeneous coordinates. To transform the table corners to sheet coordinates we first apply $C$: $C (q_1,1)=(-1.03252, 0.768499, 1.23704)$, $C (q_2,1)=(-0.0915473, -0.927641, 1.61234)$, $C (q_3,1)=(1.47321, 0.553807, 0.62639)$, and $C (q_4,1)=(0.788933, 1.18639, 0.578344)$. And then perform the dehomogenization to obtain $q_1''=(-0.834672, 0.621241)$, $q_2''=(-0.0567792, -0.575339)$, $q_3''=(2.35191, 0.884125)$, and $q_4''=(1.36412, 2.05135)$. From this we can directly calculate the distances (in m) and find $d(q_1'',q_2'')=1.43$ (should be $1.53$), $d(q_2'',q_3'')=2.82$ (should be $2.73$), $d(q_3'',q_4'')=1.53$ (should be $1.53$), and $d(q_4'',q_1'')=2.62$ (should be $2.73$). These values are not perfect. Possible reasons are that I didn't pick the points in the image accurately enough. But, as was pointed out in the video, it is crucial that you choose the reference points as accurately as possible. Playing around with those values shows that they have a huge effect on the final result. This method also does not take lens distortion into account.