68

I am trying to print every Nth line out of a file with more than 300,000 records into a new file. This has to happen every Nth record until it reaches the end of the file.

5
  • see also: unix.stackexchange.com/q/214445/117549
    – Jeff Schaller
    Commented Jun 4, 2017 at 19:54
  • Looking in your comments, we cant understand what you need. Provide sample input and sample output. Do you need a range ? From Nth line up to EOF? Commented Jun 4, 2017 at 20:25
  • thanks, I have 355,000 records which is sorted but I need to get a sample of the data (1/3 which is about 100,000) so I thought if I retrieve the 300th of the sorted file from 1 to EOF, I should be able to get a fair sample.
    – Terisa
    Commented Jun 4, 2017 at 20:50
  • What the word "records" means to you? Do you refer to number of lines in a file or you refer to a number of files? Better describe your problem with terms like files and lines. Avoid the word record. Tell us how many lines has your file or how many files you need to parse. Commented Jun 4, 2017 at 21:00
  • 3
    Please explain your requirements more clearly. Against my answer you wrote. "For example for an input file with 300000 I should get 100000 records in the output." That sentence doesn't make any sense, unless if you mentioned that n=3 and you wanted the 3rd, 6th, 9th line. Or perhaps, you wanted the 1st, 4th, 7th line. There are multiple different solutions because the way you're asking the question is not clear. Commented Jun 5, 2017 at 2:27

4 Answers 4

110
awk 'NR % 5 == 0' input > output

This prints every fifth line.

To use an environment variable:

NUM=5
awk -v NUM=$NUM 'NR % NUM == 0' input > output
8
  • I ran this command and got only 1166 in the output. I expected 100,000.
    – Terisa
    Commented Jun 4, 2017 at 21:09
  • 1
    If you want 1/3 the file, you wanted every 3rd line, not 300th.
    – Deathgrip
    Commented Jun 4, 2017 at 21:14
  • 4
    As commented in your "answer" below, pleas accept this answer as the solution. Thank you.
    – Deathgrip
    Commented Jun 4, 2017 at 21:30
  • 2
    or every 5th line starting at the 1st using NR % 5 == 1 or every 5th line starting at the 4th using NR % 5 == 4 Commented Oct 24, 2018 at 20:23
  • 1
    ffmpeg and other programs expect the data of a file or files piped in. Your solution helped me list all the JPGs in a dir and feed ever 5th filename to cat to read the data to pipe to ffmpeg. Couldn't have done it without you! (I looked all over and tried dozens of possible solutions). @northern-bradley your ==1 helps too if there's only 1 file in the directory. cat $(ls *.jpg | awk 'NR % 5 == 1' -) | ffmpeg -r 15 -f image2pipe -vcodec mjpeg -i - -r 30 test.mp4
    – Able Mac
    Commented Jun 15, 2019 at 4:15
53

To print every N  th line, use

sed -n '0~Np'
For example, to copy every 5th line of oldfile to newfile, do

sed -n '0~5p' oldfile > newfile

This uses sed’s first~step address form, which means “match every step’th line starting with line first.”  In theory, this would print lines 0, 5, 10, 15, 20, 25, …, up to the end of the file.  Of course there is no line 0, so it just prints lines 5, 10, 20, 25, …;  0~5 is just a convenient alternative way of saying 5~5 (which prints every 5th line starting with line 5; i.e., lines 5, 10, 15, 20, 25, …).

For another example of this sed capability (which does not answer the question),

sed -n '2~5p' oldfile

would print lines 2, 7, 12, 17, 22, 27, …, up to the end of the file.

Note: This approach requires GNU sed, as the first~step address form is a non-portable extension.  (Some old versions of GNU sed may require the 5~5 form as opposed to the 0~5 form.)

5
  • 4
    i like that this uses sed which is what i originally searched for but to my brain @deathgrip's use of awk is clearer Commented Oct 24, 2018 at 20:17
  • 1
    The sed solution is about 3 times faster to run than the awk solution on my computer. I confirm it is not a standard option though.
    – Totor
    Commented Mar 5, 2021 at 23:05
  • 1
    just a note that the ~ syntax requires GNU sed
    – wisbucky
    Commented Nov 6, 2021 at 5:37
  • Just a note that the answer already says that the ~ syntax requires GNU sed,  and has done so since March 5, 2021 (long before you posted that comment). Commented Dec 1, 2022 at 20:13
  • Outstanding answer. I appreciate the '2~5p' addendum because I wanted to split a file into 5 parts, and I could do "every 5th line" 5 times to create them. Using a different first number each time, of course.
    – Mike S
    Commented Jan 13, 2023 at 20:58
3

Here is the perl version:

perl -ne 'print if $. % 5 == 0;' infile > outfile
1
  • 1
    Yay to the succinct Perl one-liner in the 2020's! Take that, Python!!!
    – Mike S
    Commented Jan 13, 2023 at 21:00
1

Similarly to sed, we have also awk:

$ seq 1000000000 |awk 'NR==500000{print;exit}'
500000

NR=Number of line you want to print (and then exit to avoid waiting the file to finish). In your case

awk 'NR==Nth{print;exit}' inputfile >outputfile

Where Nth is the Nth line number you need to print.

1
  • Looks like the question was initially worded badly, and this answers the wrong question.
    – rjmunro
    Commented Oct 9, 2020 at 14:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .