Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[py-tx] Default state storage and merge is prohibitively memory intensive #1181

Open
Dcallies opened this issue Sep 1, 2022 · 0 comments
Open
Labels
python-threatexchange Items related to the threatexchange python tool / library

Comments

@Dcallies
Copy link
Contributor

Dcallies commented Sep 1, 2022

state.pickle is 5.77 GB
Observed memory increased between commit bff909b and 1.0.2, going from 13 GB => 30 GB
Command is fetch

Current implementation is loading the entire copy in memory and then merging.

Partitioning on key and splitting the files up is one option, switching to a real storage layer like sqllite is another.

Repro'd with:

  1. threatexchange fetch
@Dcallies Dcallies added the python-threatexchange Items related to the threatexchange python tool / library label Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python-threatexchange Items related to the threatexchange python tool / library
1 participant