Creating TabularLearner on GPU

You may recall from https://redditech.github.io/team-fast-tabulous/kaggle/fastai/2021/07/08/HomeSite-Quote-A-Fastai-Tabular-Approach.html that we created a TabularLearner using the following steps

df = pd.read_csv('train.csv', ...)  # plus some EDA
y_names = 'QuoteConversion_Flag'
cont_names, cat_names = cont_cat_split(df, dep_var=y_names)
# create TabularPandas
to = TabularPandas(df, procs, cat_names, cont_names, y_names, y_block, splits)
# create DataLoaders
dls = to.dataloaders(bs=4096, val_bs=512, layers=[10000,500], embed_ps=0.02, ps=[0.001,0.01])
# create TabularLearner 
learn = tabular_learner(dls, metrics=RocAucBinary())
# train the TabularLearner
learn.fit_one_cycle(n_epoch=5, lr_max=1e-2, wd=0.002)

Moving TabularLearner from GPU to GPU

Moving the TabularLearner from one GPU to another GPU was easy

# GPU 1
save_pickle("learner.pkl", learn)

# GPU 2
learn = load_pickle("learner.pkl")

Moving TabularLearner from GPU to CPU

However load_pickle("learner.pkl") on a CPU will raise an exception because the pickle file is of a TabularLearner created for a GPU. The solution is to rebuild the TabularLearner on the CPU from the DataLoaders and the TabularModel. But first you have to convert the DataLoaders and the TabularModel to CPU versions and you have to do that while on the GPU or they won't load on the CPU. Use the to() method on both objects which converts in place and returns the converted object.

# GPU
save_pickle("dataloaders_cpu.pkl", learn.dls.to("cpu"))
save_pickle("tabularmodel_cpu.pkl", learn.model.to("cpu"))

# CPU
dls = load_pickle("dataloaders_cpu.pkl")
mdl = load_pickle("tabularmodel_cpu.pkl")
learn = TabularLearner(dls=dls, model=mdl)

To check it loaded correctly on CPU, calculate some predictions and calculate the roc_auc_score

preds, targs = learn.get_preds()
print(f"Trained deep learning model has a roc_auc_score of {roc_auc_score(to_np(targs), to_np(preds[:,1]))}")

And if correct it should print the same roc_auc_score calculated on the GPU.

Trained deep learning model has a roc_auc_score of 0.9630509311593953

Alternative

If you don't have your dataloaders converted to CPU and then pickled, but you do have your TabularPandas pickled, then because TabularPandas doesn't need to be converted to CPU you can do the following

# GPU
save_pickle("tabularpandas.pkl", to)
save_pickle("tabularmodel_cpu.pkl", learn.model.to("cpu"))

# CPU
to = load_pickle("tabularpandas.pkl")
dls = to.dataloaders(bs=4096, val_bs=512, layers=[10000,500], embed_ps=0.02, ps=[0.001,0.01])
mdl = load_pickle("tabularmodel_cpu.pkl")
learn = TabularLearner(dls=dls, model=mdl)

Lightweight Alternative

If you don't need the original DataLoaders in your model you can export the model with an empty DataLoaders. This has the advantage of producing a very compact exported model in one step on the GPU.

# GPU
learn.export(fname="learn_empty_dls.pkl")

# CPU
learn = load_learner("learn_empty_dls.pkl")

The disadvantage is you can't test that the model came across properly using learn.get_preds() with no parameters, because it has no DataLoaders. However, you can still do inference using the following:

# Pandas Series of independent variables (e.g. sr_row)
pred = learn.predict(sr_row)

# Pandas DataFrame with many rows of independent variables (e.g. df_rows)
dl = learn.dls.test_dl(df_rows)
dl.dataset.conts = dl.dataset.conts.astype(np.float32)
inp,preds,_,dec_preds = learn.get_preds(dl=dl, with_input=True, with_decoded=True)

Alternatives which don't work

The error message when loading a GPU TabularLearner on a CPU is:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

The error message suggests that there is something that can be done on the CPU which can solve the problem. However I tried doing what it suggested.

# GPU 
save_pickle("tabularlearner.pkl", learn)
save_pickle("tabularpandas.pkl", to)

# CPU
learn = load_pickle("tabularlearner.pkl")  # RuntimeError above
learn = torch.load("tabularlearner.pkl", map_location=torch.device("cpu"))  # RuntimeError above

Using the lightweight alternative described above load_learner(filepath) with a TabularLearner saved using save_pickle also produces the same error.

Another suggestion was using learn.save("learn_save") on the GPU which saves a file called learn_save.pth. This file can be loaded on CPU with torch.load("learn_save.pth", map_location=torch.device("cpu")) but it just gives a OrderDict with keys ["model", "opt"]. Looking online suggested I save the model state dict and use load_state-dict method on an empty model to load it back in. It looks like learn_save.pth maybe such a state dict but I didn't know how to create an empty TabularModel to call method load_state_dict on.

Finally I thought that because the to("cpu") method works in place, I could convert learn to CPU version on GPU and only need to pickle one file. However none of the following attempts on the GPU created file which could be loaded on CPU.

# Failure 1
learn.to("cpu")
save_pickle("fail.pkl", learn)

# Failure 2
learn.model.to("cpu")
learn.dls.to("cpu")
save_pickle("fail.pkl", learn)

# Failure 3
learn.model = learn.model.to("cpu")
learn.dls = learn.dls.to("cpu")
save_pickle("fail.pkl", learn)