Introduction

This notebook is a demonstration of what I did to get the Homesite competition data from Kaggle.

Tools

If you haven't installed the Kaggle tools, you would need to do so like this

!pip install kaggle --upgrade
Requirement already satisfied: kaggle in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (1.5.12)
Requirement already satisfied: tqdm in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (4.59.0)
Requirement already satisfied: requests in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (2.25.1)
Requirement already satisfied: python-slugify in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (5.0.2)
Requirement already satisfied: urllib3 in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (1.26.4)
Requirement already satisfied: python-dateutil in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (2.8.1)
Requirement already satisfied: certifi in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (2021.5.30)
Requirement already satisfied: six>=1.10 in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from kaggle) (1.16.0)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from requests->kaggle) (4.0.0)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/anaconda3/envs/fastai/lib/python3.8/site-packages (from requests->kaggle) (2.10)

Then you need to download your Kaggle key following the instructions here

You may need to create the ~/.kaggle directory manually before copying the downloaded file kaggle.json file to that folder.

Follow the instructions as coded to download the kaggle data. For me, I use a _data folder that is excluded from checkins in the .gitignore to make sure I do not inadvertently upload it when I check in any changes to the repository, since I'm writing this blog post as I am coding. I'm not using any of the tabular stuff from fastai library yet, but I want to use some of the nice utility functions it gives me to make sure I do the command line stuff correctly and capture it in the notebook. So I need to install it.

!mamba install -c fastchan fastai -y
                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.13.0) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['fastai']

fastchan/noarch          [=>                  ] (--:--) No change
fastchan/noarch          [====================] (00m:00s) No change
fastchan/osx-64          [=>                  ] (--:--) No change
fastchan/osx-64          [====================] (00m:00s) No change
pkgs/main/osx-64         [=>                  ] (--:--) No change
pkgs/main/osx-64         [====================] (00m:00s) No change
pkgs/r/osx-64            [>                   ] (--:--) No change
pkgs/r/osx-64            [====================] (00m:00s) No change
pkgs/main/noarch         [>                   ] (--:--) No change
pkgs/main/noarch         [====================] (00m:00s) No change
pkgs/r/noarch            [=>                  ] (--:--) No change
pkgs/r/noarch            [====================] (00m:00s) No change
Transaction

  Prefix: /usr/local/anaconda3/envs/fastai

  All requested packages already installed

Now I have to load fastai and I can get the utilities I want.

from fastai.tabular.all import *
Path.cwd()
Path('/Users/nissan/code/team-fast-tabulous/_notebooks')
os.chdir('../_data')
Path.cwd()
Path('/Users/nissan/code/team-fast-tabulous/_data')
!kaggle competitions download -c homesite-quote-conversion
homesite-quote-conversion.zip: Skipping, found more recently modified local copy (use --force to force download)
path = Path.cwd()
path.ls()
(#7) [Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/homesite-quote-conversion.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv')]

Unfortunately, untar_data doesn't support zip format, so we need another tool, file_extract for this

file_extract('homesite-quote-conversion.zip')
path.ls()
(#7) [Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/homesite-quote-conversion.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv')]
file_extract('test.csv.zip')
file_extract('train.csv.zip')
file_extract('sample_submission.csv.zip')
path.ls()
(#7) [Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/homesite-quote-conversion.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv'),Path('/Users/nissan/code/team-fast-tabulous/_data/test.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/train.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv.zip'),Path('/Users/nissan/code/team-fast-tabulous/_data/sample_submission.csv')]

And there you have it. I have the uncompressed data available, and can now start doing some exploratory data analysis on it.