4
$\begingroup$

Note: this question is about how Blender is implemented, not how it's used.

IIUC, Blender uses 8 points (or more) at two keyframes to calculate the position in space of the camera. What is the algorithm used to do that (the camera position "understanding" part)?

There's a previous question: What algorithm does Blender's motion tracking use?, which seems similar, but the answer points to the feature detection algorithm, not the position solving part (it's also 6 years old, and a lot could have changed...)

Thanks.

$\endgroup$

2 Answers 2

7
$\begingroup$

I also came across this thread after reading Blender's source code and I found the thesis paper by Keir Mierle which explains the algorithms and the process of 3D reconstruction used in libvm:

https://tspace.library.utoronto.ca/bitstream/1807/10437/1/Mierle_Keir_B_200801_MASc_thesis.pdf

$\endgroup$
6
$\begingroup$

I came across your question while trying to find the answer myself, and was disappointed to see it hadn't been answered. After further digging, I don't think this question has a simple answer. There's no single white paper you can read for example that will describe the algorithm in its entirety (from what I can tell).

If you happen to be really good with C++ and linear algebra, I would recommend taking a look at the Libmv source code. This is the library where the camera-solving logic for Blender lives. Perhaps unfortunately, Blender seems to be the primary consumer of this library (though OpenCV also uses it), and it seems like work on it stopped after it got to a "good enough" state (The last commit as of July 2021 was in May 2017). Probably the closest thing to a non-code description of the algorithm you're going to find is in the comment for the "EuclideanReconstructionFromVideo" function declaration in "libmv/reconstruction/reconstruction.h":

// Computes the trajectory of a camera using matches as input.
//  - First keyframes are detected according a minimum number of shared tracks
//  - Next the first two keyframes are used to estimate an initial structure
//  - Then every other keyframe are reconstructed based on an euclidean resection
//    algorithm with the previously recontructed structures and new structures
//    are estimated (points triangulation). A bundle adjustment is performed on
//    all the reconstruction each time a keyframe is localized.
// TODO(julien) a local bundle adjustment would be sufficient?
// TODO(julien) +a global bundle adjusment on all data at the end?
//  - In a final step, non-keyframes are localized using the resection method.
//    A bundle adjusment is periodically performed on all the data.
// In the case that the tracking is lost, a new reconstruction is created.
// TODO(julien) Add the calibration matrix K as input?
// TODO(julien) remove outliers from matches or output inliers matches.

To find out exactly what terms like "resection" and "bundle adjustment" mean in this context, as well as whether those TODOs were ever done, you would have to dive deeper into the code than I am willing to go at this time.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .