Getting Kaggle Data for Homesite
Just a rehash of documentation on Kaggle, for getting Homesite competition Data.
Introduction
This notebook is a demonstration of what I did to get the Homesite competition data from Kaggle.
!pip install kaggle --upgrade
Then you need to download your Kaggle key following the instructions here
You may need to create the
~/.kaggle
directory manually before copying the downloaded filekaggle.json
file to that folder.
Follow the instructions as coded to download the kaggle data. For me, I use a _data
folder that is excluded from checkins in the .gitignore
to make sure I do not inadvertently upload it when I check in any changes to the repository, since I'm writing this blog post as I am coding. I'm not using any of the tabular stuff from fastai
library yet, but I want to use some of the nice utility functions it gives me to make sure I do the command line stuff correctly and capture it in the notebook. So I need to install it.
!mamba install -c fastchan fastai -y
Now I have to load fastai and I can get the utilities I want.
from fastai.tabular.all import *
Path.cwd()
os.chdir('../_data')
Path.cwd()
!kaggle competitions download -c homesite-quote-conversion
path = Path.cwd()
path.ls()
Unfortunately, untar_data
doesn't support zip format, so we need another tool, file_extract
for this
file_extract('homesite-quote-conversion.zip')
path.ls()
file_extract('test.csv.zip')
file_extract('train.csv.zip')
file_extract('sample_submission.csv.zip')
path.ls()
And there you have it. I have the uncompressed data available, and can now start doing some exploratory data analysis on it.