3

I'm interested in comparing two versions of smallish Excel files stored in Dropbox as separate version.

Using the Python SDK, specifically the files_download() method, I'm getting a requests.models.Response object, but I'm having trouble getting pandas.read_excel() to consume it.

Here's the code snippet:

with open(resp.content, "rb") as handle:
    df = pandas.read_excel(handle.read())

The error:

TypeError('file() argument 1 must be encoded string without null bytes, not str',)

I know I'm missing something fundamental, possibly needing to encode the file as a binary. (Tried base64.b64encode, and some other things, with no success yet.) I'm hoping someone can help me with a point in the right direction, possibly with the io module?

I'm using Python 2.7.15

For the avoidance of doubt, I'm specifically looking to avoid the step of first saving the Excel files to the filesystem. I'm sure I can accomplish the broader objective that way, but to optimize I'm trying to read the files from Dropbox directly into pandas DataFrames, and the fact that the read_excel() method takes a file-like object means—I think—that I should be able to do that.

Basically, I think this sums up the pain I'm experiencing at the moment. I need to get the response from Dropbox into the form of a file-like object.

6
  • Looks like you are missing a closing quote after rb?
    – smj
    Commented Dec 9, 2018 at 22:12
  • Please try to save the excel locally by download to file method. Reference it's path incl. file name and "C:....rb.xlsx" as the input to the pandas dataframe. I'm afraid that pandas is receiving the wrong input type. Please comment, if this didn't help you.
    – Mike_H
    Commented Dec 9, 2018 at 22:54
  • 1
    Thanks, @Mike_H. That's a good suggestion, but in response to your comment, I further clarified that I'm looking to avoid that.
    – HaPsantran
    Commented Dec 9, 2018 at 23:30
  • I'm not familiar with pandas, so I can't help with that side of it, but note that the resp.content you get from the dropbox files_download method is the file data itself (not a file handle). (In the supplied code, you appear to be trying to open a local file at the local path of whatever happens to be in resp.content, which probably isn't what you intended.)
    – Greg
    Commented Dec 10, 2018 at 19:17
  • @HaPsantran Did you find a solution for this problem? Commented Jan 21, 2019 at 14:00

1 Answer 1

8

The following code will do what you want.

# Imports and initialization of variables
from contextlib import closing # this will correctly close the request
import io
import dropbox
token = "YOURTOKEN" #get token on https://www.dropbox.com/developers/apps/
dbx = dropbox.Dropbox(token)
yourpath = "somefile.xlsx" # This approach is not limited to excel files

# Relevant streamer
def stream_dropbox_file(path):
    _,res=dbx.files_download(path)
    with closing(res) as result:
        byte_data=result.content
        return io.BytesIO(byte_data)

# Usage
file_stream=stream_dropbox_file(yourpath)
pd.read_excel(file_stream)

The nice part of this approach is that using io.BytesIO converts the data into a general file-like object. Thus you can also use this to read things like csv's with pd.read_csv().

The code should also work for non-pandas io methods, such as loading images, but I haven't tested that explicitly.

1
  • Long live @Ivo Merchiers; long live SO
    – HaPsantran
    Commented Feb 9, 2019 at 16:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.