Today  I’ve found a nice problem to solve in stack overflow. There was this guy claiming why unix utility grep didn’t parse this file in his Linux.

Long story short the file was being somehow “recognized” as binary for grep, because it had four NUL values.

It had \000 in lines 16426, 16428, 16430, 16432. I’ve revealed the existence of those lines using sed:

I’ve searched for a quick way to ride over those NUL values and found this post in Unix Stack Exchange that says:

If there is a NUL character anywhere in the file, grep will consider it as a binary file.

Which is not  true.

The behavior of grep is to read the file line by line and parse it, then it will stop doing it when the first NUL character appears.

Taking as example the file min14.pdb we can do:

The user tried directly matching for the string ‘WAT’:

There was not output but just the message of a binary file. This happens because the first occurrence or match for “WAT” happens at line 132285 and the first NUL value appears at line 16426, so grep stopped parsing the file at line 16426, that doesn’t mean that because having a NUL value in the file grep will take it as a binary one.

For this kind of situations, the best practice is to test files using “strings” to avoid any kind of character that make grep behave that way, of course the alternative “grep -a” is also valid: