Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tmk-query false positives #247

Open
jcohenho opened this issue Jun 12, 2020 · 8 comments
Open

tmk-query false positives #247

jcohenho opened this issue Jun 12, 2020 · 8 comments

Comments

@jcohenho
Copy link

jcohenho commented Jun 12, 2020

Hello again!

After extensive testing, I'm running into a lot of false positive results across many different videos using the tmk-query tool.

Here is a link to a public github repo I created with one such example. I've included two different video files and their respective tmk hashes.

After hashing these two separate videos, I'm comparing my needles and haystack set with:
./tmk-query --c1 0.7 --c2 0.7 needles.txt haystack.txt | sort -n

The result is:
0.828915 0.790789 clip_65_418_430.tmk video.tmk

Could you provide any insight into why these videos are matching? Am I missing something when creating hashes or comparing with the tmk-query tool? I can provide more examples if needed.

@johnkerl
Copy link
Contributor

Thank you!!

@jcohenho
Copy link
Author

jcohenho commented Jun 21, 2020

@johnkerl do you have any idea as to why i'm getting false positive matches with these two completely different videos?

@johnkerl
Copy link
Contributor

johnkerl commented Jun 23, 2020

@jcohenho thank you -- first thought is perhaps the ~0.8 tolerance zone was too loose.

I'm excited to hear about the extensive testing! Can you share some information about the level-1 and level-2 scores for more of your false-positive cases?

@jcohenho
Copy link
Author

Hi @johnkerl,

Just to clarify, when I run tmk-query, the first two columns are the level-1 and level-2 scores that correspond to the --c1 and --c2 arguments correct? Do you just want to see more results from my tmk-query against the library of videos I'm testing with?

Here is a longer list:

0.701424 0.724250 ./needles/clip_63_394_409.tmk ../../haystack/4304762620.tmk
0.701482 0.729434 ./needles/clip_65_418_430.tmk ../../haystack/4304762620.tmk
0.701586 0.727049 ./needles/clip_62_378_394.tmk ../../haystack/8545625950.tmk
0.708319 0.709165 ./needles/clip_59_357_362.tmk ../../haystack/9876092082.tmk
0.708781 0.703152 ./needles/clip_65_418_430.tmk ../../haystack/1705689899.tmk
0.709351 0.707681 ./needles/clip_90_456_466.tmk ../../haystack/3132932351.tmk
0.710304 0.706610 ./needles/clip_90_456_466.tmk ../../haystack/7488560177.tmk
0.710640 0.706276 ./needles/clip_65_418_430.tmk ../../haystack/7403319953.tmk
0.711533 0.709757 ./needles/clip_63_394_409.tmk ../../haystack/6824241941.tmk
0.712311 0.736267 ./needles/clip_65_418_430.tmk ../../haystack/9637086807.tmk
0.712575 0.729686 ./needles/clip_65_418_430.tmk ../../haystack/9537219712.tmk
0.713084 0.703157 ./needles/clip_65_418_430.tmk ../../haystack/5288489068.tmk
0.713765 0.705212 ./needles/clip_90_456_466.tmk ../../haystack/7201179539.tmk
0.718981 0.723688 ./needles/clip_16_140_153.tmk ../../haystack/5540279788.tmk
0.719635 0.702121 ./needles/clip_65_418_430.tmk ../../haystack/8794779068.tmk
0.721018 0.714752 ./needles/clip_59_357_362.tmk ../../haystack/1689354702.tmk
0.722227 0.729621 ./needles/clip_59_357_362.tmk ../../haystack/8953057686.tmk
0.722591 0.713023 ./needles/clip_65_418_430.tmk ../../haystack/3883036589.tmk
0.722637 0.713299 ./needles/clip_37_237_246.tmk ../../haystack/1639206967.tmk
0.723007 0.735340 ./needles/clip_90_456_466.tmk ../../haystack/9637086807.tmk
0.725125 0.715759 ./needles/clip_106_664_684.tmk ../../haystack/2661278471.tmk
0.725164 0.736653 ./needles/clip_65_418_430.tmk ../../haystack/0481629324.tmk
0.729034 0.700245 ./needles/clip_90_456_466.tmk ../../haystack/1735605575.tmk
0.729100 0.745117 ./needles/clip_90_456_466.tmk ../../haystack/5347345302.tmk
0.729102 0.727861 ./needles/clip_65_418_430.tmk ../../haystack/8748537131.tmk
0.729626 0.720933 ./needles/clip_59_357_362.tmk ../../haystack/7148697053.tmk
0.732095 0.736921 ./needles/clip_62_378_394.tmk ../../haystack/5940883728.tmk
0.733803 0.731598 ./needles/clip_65_418_430.tmk ../../haystack/0970735518.tmk
0.734619 0.725880 ./needles/clip_65_418_430.tmk ../../haystack/2271872409.tmk
0.734865 0.725444 ./needles/clip_106_664_684.tmk ../../haystack/3113745105.tmk
0.735034 0.724343 ./needles/clip_38_246_263.tmk ../../haystack/9696198039.tmk
0.736024 0.713466 ./needles/clip_59_357_362.tmk ../../haystack/6367414579.tmk
0.736624 0.720755 ./needles/clip_38_246_263.tmk ../../haystack/0728610054.tmk
0.739576 0.757686 ./needles/clip_59_357_362.tmk ../../haystack/8712928242.tmk
0.741155 0.721292 ./needles/clip_38_246_263.tmk ../../haystack/4304762620.tmk
0.741983 0.707826 ./needles/clip_4_27_44.tmk ../../haystack/1639206967.tmk
0.742138 0.705701 ./needles/clip_65_418_430.tmk ../../haystack/5209228963.tmk
0.742232 0.734402 ./needles/clip_65_418_430.tmk ../../haystack/5347345302.tmk
0.746027 0.724908 ./needles/clip_90_456_466.tmk ../../haystack/4702449382.tmk
0.746755 0.761313 ./needles/clip_65_418_430.tmk ../../haystack/6779475805.tmk
0.747831 0.725664 ./needles/clip_65_418_430.tmk ../../haystack/7550398664.tmk
0.748039 0.749356 ./needles/clip_59_357_362.tmk ../../haystack/1138692140.tmk
0.748788 0.726332 ./needles/clip_65_418_430.tmk ../../haystack/7195434310.tmk
0.750477 0.746034 ./needles/clip_65_418_430.tmk ../../haystack/3039349834.tmk
0.756347 0.766993 ./needles/clip_65_418_430.tmk ../../haystack/9966117635.tmk
0.757652 0.728766 ./needles/clip_65_418_430.tmk ../../haystack/7073305196.tmk
0.759797 0.749851 ./needles/clip_65_418_430.tmk ../../haystack/4702449382.tmk
0.760223 0.769756 ./needles/clip_59_357_362.tmk ../../haystack/3206890901.tmk
0.760658 0.759999 ./needles/clip_65_418_430.tmk ../../haystack/4736217465.tmk
0.761135 0.754722 ./needles/clip_59_357_362.tmk ../../haystack/4993711965.tmk
0.764817 0.787628 ./needles/clip_106_664_684.tmk ../../haystack/2610938319.tmk
0.765291 0.729330 ./needles/clip_59_357_362.tmk ../../haystack/3224352717.tmk
0.768169 0.787347 ./needles/clip_65_418_430.tmk ../../haystack/1639206967.tmk
0.769148 0.752169 ./needles/clip_65_418_430.tmk ../../haystack/5561811386.tmk
0.769168 0.734276 ./needles/clip_4_27_44.tmk ../../haystack/4304762620.tmk
0.770742 0.765770 ./needles/clip_65_418_430.tmk ../../haystack/8545625950_k.tmk
0.772002 0.747713 ./needles/clip_65_418_430.tmk ../../haystack/6868044816.tmk
0.773601 0.757714 ./needles/clip_65_418_430.tmk ../../haystack/3071884561.tmk
0.774219 0.768757 ./needles/clip_65_418_430.tmk ../../haystack/4733220414.tmk
0.779221 0.742252 ./needles/clip_65_418_430.tmk ../../haystack/2735792862.tmk
0.784012 0.737720 ./needles/clip_37_237_246.tmk ../../haystack/8794779068.tmk
0.788096 0.753398 ./needles/clip_65_418_430.tmk ../../haystack/5723468857.tmk
0.788848 0.783313 ./needles/clip_59_357_362.tmk ../../haystack/8079327790.tmk
0.790015 0.735624 ./needles/clip_90_456_466.tmk ../../haystack/4736217465.tmk
0.792202 0.767012 ./needles/clip_65_418_430.tmk ../../haystack/7522532846.tmk
0.793449 0.721210 ./needles/clip_37_237_246.tmk ../../haystack/0338388076.tmk
0.793711 0.749790 ./needles/clip_65_418_430.tmk ../../haystack/0484671290.tmk
0.798029 0.709030 ./needles/clip_37_237_246.tmk ../../haystack/9966117635.tmk
0.804429 0.799744 ./needles/clip_65_418_430.tmk ../../haystack/5940883728.tmk
0.805439 0.773944 ./needles/clip_65_418_430.tmk ../../haystack/1925738538.tmk
0.806802 0.791950 ./needles/clip_59_357_362.tmk ../../haystack/0134115277.tmk
0.807566 0.775979 ./needles/clip_65_418_430.tmk ../../haystack/4470743183.tmk
0.811182 0.744237 ./needles/clip_4_27_44.tmk ../../haystack/0481629324.tmk
0.817829 0.790668 ./needles/clip_65_418_430.tmk ../../haystack/2124707866.tmk
0.825436 0.761574 ./needles/clip_90_456_466.tmk ../../haystack/3071884561.tmk
0.826490 0.826427 ./needles/clip_59_357_362.tmk ../../haystack/6626160849.tmk
0.828915 0.790789 ./needles/clip_65_418_430.tmk ../../haystack/6325635370.tmk
0.828915 0.790789 ./needles/clip_65_418_430.tmk ../../haystack/8310293810.tmk
0.854163 0.790387 ./needles/clip_5_44_51.tmk ../../haystack/2341324345.tmk
0.854188 0.790400 ./needles/clip_5_44_51.tmk ../../haystack/8512702394.tmk

Only the last two rows are correct matches, maybe I need to set my threshold higher? I'm working with a library of over 100,000 videos so I have a lot of content to work with :)

@johnkerl
Copy link
Contributor

@jcohenho yes I would set the threshold higher. I think your evaluation set is larger than ours was for this project. The evaluation sets we used internally were (a) one smaller, public/general-content one, and (b) one larger, domain-specific dataset. This is really great info to have, adding another (large) dataset! :)

@github-actions
Copy link
Contributor

This issue is being marked as stale because it has no recent activity. It will be closed automatically in 14 days
unless it becomes active before then. To prevent closing, please comment on the issue before that time. If the
issue is no longer relevant, please feel free to close it prior to that time.

Cleaning up stale issues helps redirect focus to the issues top of mind of the community. Thank you for your help
with this.

@github-actions github-actions bot added the Stale label Nov 13, 2020
@github-actions
Copy link
Contributor

This issue has been closed due to no recent activity. If you need this issue reopened, please let us know.
Thanks!

@Dcallies Dcallies reopened this Feb 14, 2023
@Dcallies
Copy link
Contributor

This issues got referenced in https://www.hackerfactor.com/blog/index.php?/archives/971-FB-TMK-PDQ-WTF.html with a longer writeup with practical results.

I'm from after the time when we evaluated TMK, but we may want to update guidance on thresholds or more emphasis on tuning the thresholds for desired precision/recall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment