How To Search Pattern In Big Binary Files Efficiently
I have several binary files, which are mostly bigger than 10GB. In this files, I want to find patterns with Python, i.e. data between the pattern 0x01 0x02 0x03 and 0xF1 0xF2 0xF3.
Solution 1:
The common way when searching a pattern in a large file is to read the file by chunks into a buffer that has the size of the read buffer + the size of the pattern - 1.
On first read, you only search the pattern in the read buffer, then you repeatedly copy size_of_pattern-1 chars from the end of the buffer to the beginning, read a new chunk after that and search in the whole buffer. That way, you are sure to find any occurence of the pattern, even if it starts in one chunk and ends in next.
Post a Comment for "How To Search Pattern In Big Binary Files Efficiently"