This script will split a text file into a given number of sections, avoiding splitting text lines across sections. It can be used where there is only sufficient space to hold one section at a time. It operates by copying sections of the source file starting at the end, then truncating the source to free up space. So if you have a 1.8GB file and 0.5GB free space, you would need to use 4 sections (or more if you wish to have smaller output files). The last section is just renamed, as there is no need to copy it. After splitting, the source file no longer exists (there would be no room for it anyway).
The main part is an awk script (wrapped in Bash), which only sets up the section sizes (including adjusting to the section coincides with a newline). It uses the system() function to invoke dd, truncate and mv for all the heavy lifting.
$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ dd --version
dd (coreutils) 8.28
$ truncate --version
truncate (GNU coreutils) 8.28
The script takes between one and four arguments:
./splitBig Source nSect Dest Debug
Source: is the filename of the file to be split into sections.
nSect: is the number of sections required (default 10).
Dest: is a printf() format used to generate the names of the sections.
Default is Source.%.3d, which appends serial numbers (from .001 up) to the source name.
Section numbers correspond to the original order of the source file.
Debug: generates some diagnostics (default is none).
Test Results:
$ mkdir TestDir
$ cd TestDir
$
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:54 leipzig1M.txt
$
$ time ../splitBig leipzig1M.txt 5
real 0m0.780s
user 0m0.045s
sys 0m0.727s
$ ls -s -l
total 126620
25324 -rw-rw-r-- 1 paul paul 25928991 Aug 27 15:56 leipzig1M.txt.001
25324 -rw-rw-r-- 1 paul paul 25929019 Aug 27 15:56 leipzig1M.txt.002
25324 -rw-rw-r-- 1 paul paul 25928954 Aug 27 15:56 leipzig1M.txt.003
25324 -rw-rw-r-- 1 paul paul 25928977 Aug 27 15:56 leipzig1M.txt.004
25324 -rw-rw-r-- 1 paul paul 25928856 Aug 27 15:56 leipzig1M.txt.005
$
$ rm lei*
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:57 leipzig1M.txt
$ time ../splitBig leipzig1M.txt 3 "Tuesday.%1d.log" 1
.... Section 3 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=86429864 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=86430023 iflag=skip_bytes of="Tuesday.3.log" status=none
#.. system: truncate -s 86430023 "leipzig1M.txt"
.... Section 2 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=43214932 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=43214997 iflag=skip_bytes of="Tuesday.2.log" status=none
#.. system: truncate -s 43214997 "leipzig1M.txt"
.... Section 1 ....
#.. system: mv "leipzig1M.txt" "Tuesday.1.log"
real 0m0.628s
user 0m0.025s
sys 0m0.591s
$ ls -s -l
total 126612
42204 -rw-rw-r-- 1 paul paul 43214997 Aug 27 15:58 Tuesday.1.log
42204 -rw-rw-r-- 1 paul paul 43215026 Aug 27 15:58 Tuesday.2.log
42204 -rw-rw-r-- 1 paul paul 43214774 Aug 27 15:58 Tuesday.3.log
$
Script:
#! /bin/bash --
LC_ALL="C"
splitFile () { #:: (inFile, Pieces, outFmt, Debug)
local inFile="${1}" Pieces="${2}" outFmt="${3}" Debug="${4}"
local Awk='
BEGIN {
SQ = "\042"; szLine = 8192; szFile = "128M";
fmtLine = "dd bs=%d count=1 if=%s skip=%d iflag=skip_bytes status=none";
fmtFile = "dd bs=%s if=%s skip=%d iflag=skip_bytes of=%s status=none";
fmtClip = "truncate -s %d %s";
fmtName = "mv %s %s";
}
function findNl (fIn, Seek, Local, cmd, lth, txt) {
cmd = sprintf (fmtLine, szLine, SQ fIn SQ, Seek);
if (Db) printf ("#.. findNl: %s\n", cmd);
cmd | getline txt; close (cmd);
lth = length (txt);
if (lth == szLine) printf ("#### Line at %d will be split\n", Seek);
return ((lth == szLine) ? Seek : Seek + lth + 1);
}
function Split (fIn, Size, Pieces, fmtOut, Local, n, seek, cmd) {
for (n = Pieces; n > 1; n--) {
if (Db) printf (".... Section %3d ....\n", n);
seek = int (Size * ((n - 1) / Pieces));
seek = findNl( fIn, seek);
cmd = sprintf (fmtFile, szFile, SQ fIn SQ, seek,
SQ sprintf (outFmt, n) SQ);
if (Db) printf ("#.. system: %s\n", cmd);
system (cmd);
cmd = sprintf (fmtClip, seek, SQ fIn SQ);
if (Db) printf ("#.. system: %s\n", cmd);
system (cmd);
}
if (Db) printf (".... Section %3d ....\n", n);
cmd = sprintf (fmtName, SQ fIn SQ, SQ sprintf (outFmt, n) SQ);
if (Db) printf ("#.. system: %s\n", cmd);
system (cmd);
}
{ Split( inFile, $1, Pieces, outFmt); }
'
stat -L -c "%s" "${inFile}" | awk -v inFile="${inFile}" \
-v Pieces="${Pieces}" -v outFmt="${outFmt}" \
-v Db="${Debug}" -f <( printf '%s' "${Awk}" )
}
#### Script body starts here.
splitFile "${1}" "${2:-10}" "${3:-${1}.%.3d}" "${4}"
logrotate
and apply it to the exiting file, too. This would then prevent the same scenario in future and allow compressing older logs. However I am not sure how the initial splitting would be done regarding your disk size limitation.dd
to write the last n-bytes to a file, then usetruncate
to reduce the size of the logfile by that amount. Loop through that until the file is nicely chopped up. Of course this does not take newlines as the standard cutoff position and needs to be executed with great care.less application.log
and search for your string within with/
?