Skipping lines in a large file, then pipe to external command in Nushell

Question

I'm trying to write a very simple Nushell script. I want to skip the first 46 lines and pipe the rest to an external command. In Bash:

zcat filename.sql.gz |
    tail -n +46 |
    mysql dbname

Since Nushell's open doesn't seem to support compressed formats, I extracted the .gz and then, in Nushell, tried:

> open --raw filename.sql | lines | skip 46 | describe
list<string>

...so I have a list, therefore I should be able to convert it to one string, but...

> open --raw filename.sql | lines | skip 46 | to text | mysql dbname

It appears as though that tries to load the whole file into memory, but it's much too large. Is there a way to avoid that?

(A) In zcat filename.sql.gz & in open --raw filename.sql we have file name mismatch. Is that intentional ? (B) Can you make filename-46.sql earlier (one time) to use that Directly on mysql ? — Prem, Commented Sep 16, 2022 at 14:40
@Prem (A) I edited the question to clarify the name-mismatch, since it's explained in the OP's Github issue on the same topic. (B) would run into the same issue (would try to process the entire file in memory before writing to filename-46.sql) as long as to text is being used. — NotTheDr01ds, Commented Sep 16, 2022 at 21:39
What I intended to say was :: "When OP converted sql.gz to sql, OP could very well have removed the 46 lines making sql-46 , then mysql can Directly use that new file sql-46 without getting into nushell Issues" :: Your Solution in nushell works too, but this is one alternative without nushell ! — Prem, Commented Sep 17, 2022 at 8:10
@Prem Ah, got it. I get the feeling the OP was just trying to replicate the Bash version in Nushell. — NotTheDr01ds, Commented Sep 17, 2022 at 16:30

NotTheDr01ds · Accepted Answer · 2022-09-18 14:19:19Z

Important: As @Prem correctly mentions in the comments, my current answer below only works for a narrow use-case, and can cause issues in most SQL-based workloads like in the question. That it seems to have worked for the OP (based on current acceptance of the answer) is a happy coincidence.

See the bottom of this answer for the reasons why it is wrong, situations in which it might cause issues, and how we might (eventually) get it to work.

Short answer (addressing several of your issues), try:

zcat filename.sql.gz |
    lines |
    skip 46 |
    each {
        mysql dbname
    }

Explanation:

As fdncred mentioned in your Github issue, the problem is the to text.

That's attempting to convert the entire list<string> to text. That, of course, can't happen until the entire file is read.

What you really seem to want it to simply process each line as it is read.

You can see this demonstrated without feeding it to MySQL with something like:

open filename.mysql | 
    lines |
    skip 46 |
    each {
        bash -c "cat; echo"
        sleep 1sec
    }

_{^^^ Quick-and-dirty use of Bash to print what it receives on stdin (the cat) + a newline (the echo).}

Note that there's some shorthand in there, primarily around implicit output, and the each block handling. A more conventional, explicit-output, explicit-each-variable option:

open filename.mysql | 
    lines |
    skip 46 |
    each {|sql_statement|
        echo $sql_statement | bash -c "cat; echo"
        sleep 1sec
    }

Also, as you probably noticed in the "short answer", using this form you can return to using zcat to process the compressed file.

The "Oops" section

As mentioned in the front-matter, this doesn't work in all (perhaps most) cases.

The each will spawn a new mysql process for each line in the original file. This often won't work, because SQL statements can span multiple lines. Attempting to pass a single line of a multi-line statement into a single mysql process will fail.

I can think of a half-dozen different ways to make this work, but so far all of them would require changes to Nushell.

The best scenario, IMHO, would be for to text to stream as you original expected, which is a recent Nushell feature request (#6178).

Continuing my Comment to Question Post, I have a Query here : [ Is it going line-by-line to Single MYSQL Process ? (if yes, then ok) ] [ Or is Mysql getting executed once for each line ? (if yes, then that is a Problem) ] I am assuming there is Single MYSQL Process, otherwise it will not work ! — Prem, Commented Sep 17, 2022 at 8:21
@Prem I was wondering about that myself. My assumption is that, if the original Bash version was working, then this Nushell equivalent should work as well. — NotTheDr01ds, Commented Sep 17, 2022 at 16:27
The Original Bash version will always work , because there is a Single MYSQL Process , which is a "guarantee" by the PIPE Syntax. The nushell version may or may not have a Single MYSQL Process. Not knowing much about nushell , I think each \{ COMMAND .... \} will start COMMAND against each line. This will break , when the INPUT SQL Statement is longer than a Single line. — Prem, Commented Sep 17, 2022 at 18:10
@Prem That is, of course, a great point. You are correct that each is spawning a new MySQL process on each iteration, which is not the same as the Bash behavior. Let me see if I can come up with an alternative that streams to a single process. — NotTheDr01ds, Commented Sep 17, 2022 at 20:31

Stack Exchange Network

Skipping lines in a large file, then pipe to external command in Nushell

1 Answer 1

The "Oops" section

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
nushell
.

Hot Network Questions

Skipping lines in a large file, then pipe to external command in Nushell

1 Answer 1

The "Oops" section

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged nushell.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
nushell
.