Why can't I read a joblib file from my github repo?

Question

I've built a simple app in Python, with a front-end UI in Dash.

It relies on three files,

small dataframe, in pickle format ,95KB
large scipy sparse matrix, in NPZ format, 12MB
large scikit KNN-model, in job lib format, 65MB

I have read in the first dataframe successfully by

link = 'https://github.com/user/project/raw/master/filteredDF.pkl'
df = pd.read_pickle(link)

But when I try this with the others, say, the model by:

mLink = 'https://github.com/user/project/raw/master/knnModel.pkl'
filehandler = open(mLink, 'rb') 
model_knn = pickle.load(filehandler)

I just get an error

Invalid argument: 'https://github.com/user/project/raw/master/knnModel45Percent.pkl'

I also pushed these files using Github LFS, but the same error occurs.

I understand that hosting large static files on github is bad practice, but I haven't been able to figure out how to use PyDrive or AWS S3 w/ my project. I just need these files to be read in by my project, and I plan to host the app on something like Heroku. I don't really need a full-on DB to store files. The best case would be if I could read in these large files stored in my repo, but if there is a better approach, I am willing as well. I spent the past few days struggling through Dropbox, Amazon, and Google Cloud APIs and am a bit lost. Any help appreciated, thank you.

JQadrad · Accepted Answer · 2020-05-13 23:49:04Z

3

Could you try the following?

from io import BytesIO
import pickle
import requests
mLink = 'https://github.com/aaronwangy/Kankoku/blob/master/filteredAnimeList45PercentAll.pkl?raw=true'
mfile = BytesIO(requests.get(mLink).content)
model_knn = pickle.load(mfile)

Using the BytesIO you create a file object out of the response that you get from GitHub. That object can then be using in pickle.load. Note that I have added ?raw=true to the URL of the request.

answered May 13, 2020 at 23:49

JQadrad

5413 silver badges16 bronze badges

Yes, the reading of the pickle dataframe works great, perhaps due to its small size. The main issue lies with the joblib and npz files which are much larger, but still less than 75MB
– AxW
May 14, 2020 at 0:00
I am having the same issue. I have used your code but now get a KeyError10. Do you know why?
– mblume
Apr 14, 2022 at 8:10

Add a comment |

Claude ROUSSAUX · Accepted Answer · 2023-02-16 17:18:32Z

0

For the ones having the KeyError 10 try

model_knn = joblib.load(mfile)

instead of

model_knn = pickle.load(mfile)

answered Feb 16, 2023 at 17:18

Claude ROUSSAUX

11 bronze badge

Add a comment |

Collectives™ on Stack Overflow

Why can't I read a joblib file from my github repo?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
file
github
linux-from-scratch
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonfilegithublinux-from-scratch or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
file
github
linux-from-scratch
or ask your own question.