Reading a binary file with python

Question

I find particularly difficult reading binary file with Python. Can you give me a hand? I need to read this file, which in Fortran 90 is easily read by

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

In detail, the file format is:

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N.

How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?

Was this file written by a Fortran program? If so, how was it written, since Fortran, by default, adds additional data before each record it writes to file. You may need to take care with this when reading the data. — Chris, Commented Jan 3, 2012 at 10:02
Please ignore my previous comment, the intergers 8 and 4*N are clearly this additional data. — Chris, Commented Jan 3, 2012 at 10:43
Also, see answers to the question reading binary file in python. — Chris, Commented Jan 3, 2012 at 10:46
Numpy's fromfile function makes it easy to read binary files. I recommend it. — littleO, Commented Apr 17, 2020 at 11:25
...and always watch out for your endian-nesses, esp. when porting between different manufacturer's computers. — DragonLord, Commented May 26, 2020 at 18:06

gecco · Accepted Answer · 2012-01-03 11:15:44Z

234

Read the binary file content like this:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

then "unpack" binary data using struct.unpack:

The start bytes: struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

The end byte: struct.unpack("i", fileContent[-4:])

edited Jan 3, 2012 at 11:15

answered Jan 3, 2012 at 10:46

gecco

18.5k11 gold badges52 silver badges68 bronze badges

1

Can you please have look at this other post? stackoverflow.com/questions/8092469/… ... I am again to read another binary file, but in this case I don't know the byte structure in details. For example, I figured out that sometimes there is the integer 8. However, with IDL it is really simple to read this data. Can I do the same with python?
– Brian
Commented Jan 4, 2012 at 9:45
Please indicate (inside the other post, not here) why you are not happy with the posted answers and comments. Perhaps you should also update the question to provide more details... I'll have a look at it when it is updated.
– gecco
Commented Jan 4, 2012 at 9:59
See this answer if you need to convert an unpacked char[] to a string.
– PeterM
Commented May 20, 2016 at 14:57
1

import struct
– J W
Commented Feb 21, 2017 at 10:17
Why divide by 4 ?
– Sam Gomari
Commented Jul 16, 2021 at 16:54

Add a comment |

Eugene Yarmash · Accepted Answer · 2018-11-02 21:00:00Z

26

To read a binary file to a bytes object:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

To create an int from bytes 0-3 of the data:

i = int.from_bytes(data[:4], byteorder='little', signed=False)

To unpack multiple ints from the data:

import struct
ints = struct.unpack('iiii', data[:16])

answered Nov 2, 2018 at 21:00

Eugene Yarmash

148k42 gold badges337 silver badges384 bronze badges

Add a comment |

unwind · Accepted Answer · 2012-01-03 10:55:50Z

25

In general, I would recommend that you look into using Python's struct module for this. It's standard with Python, and it should be easy to translate your question's specification into a formatting string suitable for struct.unpack().

Do note that if there's "invisible" padding between/around the fields, you will need to figure that out and include it in the unpack() call, or you will read the wrong bits.

Reading the contents of the file in order to have something to unpack is pretty trivial:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the @ symbol). The Is in the formatting string mean "unsigned integer, 32 bits".

edited Jan 3, 2012 at 10:55

answered Jan 3, 2012 at 10:18

unwind

397k64 gold badges478 silver badges613 bronze badges

ok, but I don't even know how to read the bytes of the file. From my question how can I read the file from bytes 5 to 8 and then convert the result to an integer? Sorry, but I'm new with Python.
– Brian
Commented Jan 3, 2012 at 10:39
1

What about closing the file ? It's a very bad practice leaving a file opened which can be easily avoided by using a with statement. The answer is useful but it can be misleading for people like me who are still learning how to handle files in Python
– CamelCamelius
Commented Apr 13, 2021 at 17:11
@CamelCamelius Note that we could think that the file would be automatically closed because the handle gets out of scope. But this is not always the case stackoverflow.com/questions/2404430/…
– Fuujuhi
Commented Mar 2, 2022 at 14:57

Add a comment |

Chris · Accepted Answer · 2012-01-03 10:41:29Z

18

You could use numpy.fromfile, which can read data from both text and binary files. You would first construct a data type, which represents your file format, using numpy.dtype, and then read this type from file using numpy.fromfile.

answered Jan 3, 2012 at 10:41

Chris

45.8k16 gold badges139 silver badges160 bronze badges

2

Easy to miss this! Docs are a bit thin; see reddit.com/r/Python/comments/19q8nt/… for some discussion
– lost
Commented Mar 23, 2017 at 10:55

Add a comment |

Fax · Accepted Answer · 2020-07-10 18:27:50Z

I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).

With binaryfile you'd do something like this (I'm guessing, since I don't know Fortran):

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

Which produces an output like this:

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.

cxrodgers · Accepted Answer · 2022-12-15 20:02:45Z

If the data is array-like, I like to use numpy.memmap to load it.

Here's an example that loads 1000 samples from 64 channels, stored as two-byte integers.

import numpy as np
mm = np.memmap(filename, np.int16, 'r', shape=(1000, 64))

You can then slice the data along either axis:

mm[5, :] # sample 5, all channels
mm[:, 5] # all samples, channel 5

All the usual formats are available, including C- and Fortran-order, various dtypes and endianness, etc.

Some advantages of this approach:

No data is loaded into memory until you actually use it (that's what a memmap is for).
More intuitive syntax (no need to generate a struct.unpack string consisting of 64000 character)
Data can be given any shape that makes sense for your application.

For non-array data (e.g., compiled code), heterogeneous formats ("10 chars, then 3 ints, then 5 floats, ..."), or similar, one of the other approaches given above probably makes more sense.

Chris · Accepted Answer · 2022-03-22 23:34:16Z

-2

#!/usr/bin/python

import array
data = array.array('f')
f = open('c:\\code\\c_code\\no1.dat', 'rb')
data.fromfile(f, 5)
print(data)

edited Mar 22, 2022 at 23:34

Chris

134k122 gold badges297 silver badges272 bronze badges

answered Mar 22, 2022 at 15:53

Ishraga Mustafa Awad Allam

1

1

Welcome to Stack Overflow. Code is a lot more helpful when it is accompanied by an explanation. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please edit your question and explain how it answers the specific question being asked. See How to Answer.
– Chris
Commented Mar 22, 2022 at 23:32
1

And a couple of copyright notes: (a) There's generally no need to credit yourself on trivial code like this. (b) By writing this on Stack Overflow you have licensed it under a Creative Commons license.
– Chris
Commented Mar 22, 2022 at 23:33

Add a comment |

Phil · Accepted Answer · 2017-12-13 17:15:46Z

-3

import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()

edited Dec 13, 2017 at 17:15

Phil

2,8071 gold badge25 silver badges30 bronze badges

answered Dec 13, 2017 at 16:26

Eeshitri

15

9

Probably worth just a little explanation of why this is better than (or at least as good as) other answers.
– Phil
Commented Dec 13, 2017 at 16:38
3

have you tested an verified this works with the fortran generated binary?
– agentp
Commented Dec 13, 2017 at 17:14
4

And also explain what does it do... What is pickle? What does pickle.load load? Does it load a Fortran stream, direct or sequential files? They are different and not compatible.
– Vladimir F Героям слава
Commented Dec 13, 2017 at 19:37
Pickle binary files have information about the data. You can test this yourself.
– Byron
Commented Nov 7, 2020 at 3:26

Add a comment |

Collectives™ on Stack Overflow

Reading a binary file with python

8 Answers 8

Not the answer you're looking for? Browse other questions tagged
python
binary
fortran
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Not the answer you're looking for? Browse other questions tagged pythonbinaryfortran or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
binary
fortran
or ask your own question.