Optimal command-line search inside a sorted text file

Question

Let's say I have a text file with billions of text lines sorted alphabetically, like

Bar=10
Foo=6
Naz=42

How can I search for the line starting with Foo in the most optimal way (the file contains billions of variables like this), knowing lines are sorted alphabetically and that the line I want to find must start (or "contain" if it's easier to search for) a specific text?

Edit:

This question can be considered as duplicate of https://askubuntu.com/q/423886/10473 Answer is to use look which is fast enough for such research

What do you want out of the search? A "yes" or "no" or the actual line that matches, or just the number after =? Will you only be searching with a single string or with many separate strings (expecting many answers)? Do you care for substring matches (so that Foo matches not only Foo but also AhFoo and Foobiz, or Hoo=Foo etc.)? Are these variables that would be valid in a shell? Are there duplicated lines, or duplicated variable names? — Kusalananda, Commented Jan 8, 2021 at 23:19
@Kusalananda I want the line (since I also want the variable value). I search only one string at a time (say Foo or Bar or Naz). I won't search for "Naz=" nor "42" nor "Naz=21" nor "Naz=42". I actually search the "full match" from line start (Foo matches Foo but not AhFoo nor Hoo=Foo); I don't care if it matches Foobiz: I'm not looking for it, but if it makes commander easier, it's fine — Xenos, Commented Jan 8, 2021 at 23:24
see askubuntu.com/q/423886/10473 and unix.stackexchange.com/q/499306/4778 — ctrl-alt-delor, Commented Jan 8, 2021 at 23:41
@ctrl-alt-delor Thanks, I didn't know look was actually what I looked for. I made it using ... | xargs -I "{}" look -f "{}" "sorted.txt" which returns the result within a second. You may make an answer if you want me to accept it and get the reputation from it ;) Thanks again! — Xenos, Commented Jan 11, 2021 at 16:05

bxm · Accepted Answer · 2021-01-09 11:29:27Z

0

I don't know how this will scale to the volumes you're talking about, but it seems to work with a file containing this:

Foo=123
Foobar=646
Foobar=85489
Noo=8654
Noobar=8262

awk -F= '{if ($1 > "Foobar") { exit } ; if ($1 == "Foobar") { print $0 } }' sorted.txt

This is just a proof of concept. It would be a simple matter to adapt so the term you are matching against is passed in.

answered Jan 9, 2021 at 11:29

bxm

4,9651 gold badge21 silver badges24 bronze badges

It didn't scale well, as it's taking more than minutes to run. I ended up using look, which I didn't know, from the comments in the question. Thanks anyway!
– Xenos
Commented Jan 11, 2021 at 16:03
Cool, glad you got there.
– bxm
Commented Jan 12, 2021 at 22:16

Add a comment |

Stack Exchange Network

Optimal command-line search inside a sorted text file

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
command-line
optimization
.

Linked

Hot Network Questions

Optimal command-line search inside a sorted text file

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged command-lineoptimization.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
command-line
optimization
.