1

A friend has thousands of files that likely contain nothing but NULLs (ASCII 0).

(If interested, see this Super User QA to learn why).

The files range in size from 650 bytes to ~200MB (with the majority being 4-8MB).

In Windows, what's a quick way to determine which files contain only NULLs, so they can then be deleted?

If possible, using built-in Windows (7) tools is preferred.

I was thinking something like:

findstr /m /s /r ^(\x00)+$ *.*

would work to find the files consisting of only NULLs, but in testing, it doesn't return any results.


Update 1:

I experimented with this more, and found that:

findstr /m /s /r [^\x00] *

may be working to find the inverse (files that do not only contain NULLs), which can also be used to meet the goal.

But what's odd is that:

findstr /m /s /r [^\000] *

yields different results.

Because Hex 0 (\x00 in regex) = Octal 0 (\000 in regex), I would expect the same results from both commands.

This leads me to question if the results of at least one of these commands is incorrect.


Update 2:

Well, it looks like:

findstr /m /s /r [^\x00] *

may work correctly, and the fact that:

findstr /m /s /r [^\000] *

yields different results is likely yet another Microsoft bug (if they are supposed to yield different results, please correct me by explaining why those 2 commands should yield different results).

I confirmed this using the excellent cross-platform Swiss File Knife third-party tool.

Initial testing reveals the results from the SFK command:

sfk xfindbin . "/[byte not \x00]/" -names

match those of findstr /m /s /r [^\x00] *, but not findstr /m /s /r [^\000] *. This leads me to believe that I may have discovered yet another bug in Microsoft's findstr command (see SS64 for a summary of other bugs in that Microsoft tool).


Update 3:

Further testing reveals the results from the SFK command:

sfk xfindbin . "/[byte not \x00]/" -names

correctly finds some files not found by findstr /m /s /r [^\x00] * and findstr /m /s /r [^\000] *.

2 Answers 2

3

I was able to accomplish finding files that had no content or only nulls using this power shell script:

$files = Get-ChildItem -Path c:\somepath\tostartfrom -Recurse -File
foreach ($f in $files){
    $content = Get-Content -Path $f.FullName -TotalCount 10
    if ($content -match '[\\x01-\\xFF]+') { 
        #do nothing as file has a valid character in it
    }
    else {Write-Output $f.FullName}
}
Write-Host -NoNewLine 'Press any key to continue...';
$null = $Host.UI.RawUI.ReadKey('NoEcho,IncludeKeyDown');

You can use

c:\test\ps\findnullfiles.ps1 | Out-File -FilePath c:\test\ps\results.txt

to send the info to a text file for later use. Adding an additional test in the else clause you could skip file with no content if desired.

2
  • Thanks C J. Since it is looking for \x01-\xFF, will this work for detecting all Unicode characters? Commented Oct 27, 2022 at 12:23
  • Since its looking at byte values it should still work regardless of the encoding. Basically if anything besides hex 00 exists it will skip that file. One thing to note is the -TotalLines parameter. It is set to 10 in the script which says grab the first 10 lines. That could be removed to evaluate the compete file. I did that cause i figured for my purposes if there was more than 10 line, even 10 null+cr it was outside my criteria.
    – C J
    Commented Oct 27, 2022 at 12:50
0

EXPLAINING THE SITUATION :

In FINDSTR , /S is recursive , /R is regex : Using that will give the matching lines , not the filenames.

Hence we have to use /M : which prints the file names , not the matching lines.

Now , the given regex is ^(\x00)+$ where Brackets "(" & ")" are not grouping (like in Perl) but Individual Characters , hence nothing matches that in every NULL file.

The other regex is [^\x00] where ^ is not START of line , but negation of the Character Class.
It should be ^[\x00] to have START of line , assuming Support for Searching of NULL Characters.
Like wise , [^\000] should be ^[\000] , again assuming NULL Character Support.

Unfortunately, FINDSTR will not give Correct results when looking for NULL Characters :

https://ss64.com/nt/findstr.html
"FINDSTR cannot search for null bytes commonly found in Unicode files."

There are other BUGS too.

SOLUTION 1 :
Use /X (which Prints lines that match exactly) with regex [\x00]+ or [\000]+ , which will check the whole line , assuming FINDSTR can look at the NULL Characters.

SOLUTION 2 :
Install & use PERL , which will work like a charm.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .