Introduction

This is a modification of the "first pass" submission to the Homesite Competition on Kaggle Competition using Google Colab, but modifying some of the default parameters and maybe adding some learning from the initial exploratory data analysis at this time, to see if I can improve the baseline after applying what we learnt so far to see how it improves (or not) our submission then.

Changes made:

Changed from RandomSplitter() to TrainTestSplitter() for making test and validation sets more fairly weighted based on the bias of the input data towards negative results
Increase batch size to 1024 to make training shorter, but to still hopefully get a better predictor for it. Set a separate validator batch size to 128.
Increase the validation percentage to 0.25
Fix the learning rate to 1e-2
Increase epochs to 7, see if it overfits and what effect that has
Modified the cat_names and cont_names arrays with the initial insights from the EDA notebook post
Add a date part for dates
Add weight decay of 0.2

Setup fastai and Google drive

!pip install -Uqq fastai

from fastai.tabular.all import *

The snippet below is only useful in Colab for accessing my Google Drive and is straight out the fastbook source code in Github

global gdrive
gdrive = Path('/content/gdrive/My Drive')
from google.colab import drive
if not gdrive.exists(): drive.mount(str(gdrive.parent))

Only add the Kaggle bits below if I'm running locally, in Collab they're already here

!ls /content/gdrive/MyDrive/Kaggle/kaggle.json

/content/gdrive/MyDrive/Kaggle/kaggle.json

Useful links here:

Documentation on Path library
Documentation on fastai extensions to Path library

Path.cwd()

Path('/content')

Setup kaggle environment parameters

!mkdir -p ~/.kaggle
!cp /content/gdrive/MyDrive/Kaggle/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

from kaggle import api

path = Path.cwd()
path.ls()

(#4) [Path('/content/.config'),Path('/content/models'),Path('/content/gdrive'),Path('/content/sample_data')]

path = path/"gdrive/MyDrive/Kaggle/homesite_data"
path.mkdir(exist_ok=True)
Path.BASE_PATH = path
api.competition_download_cli('homesite-quote-conversion', path=path)
file_extract(path/"homesite-quote-conversion.zip")
file_extract(path/"train.csv.zip")
file_extract(path/"test.csv.zip")

homesite-quote-conversion.zip: Skipping, found more recently modified local copy (use --force to force download)

path

Path('.')

path.ls()

(#18) [Path('homesite-quote-conversion.zip'),Path('models'),Path('sample_submission.csv.zip'),Path('test.csv.zip'),Path('train.csv.zip'),Path('train.csv'),Path('test.csv'),Path('sample_submission.csv'),Path('submission.csv'),Path('submission2.csv')...]

Exploring the Homesite data

First set the random seed so that the results are reproducible

set_seed(42) 
bs = 1024
val_bs = 128
test_size = 0.25
epochs = 7
lr = 1e-2
wd=0.2

df_train = pd.read_csv(path/"train.csv", low_memory=False)
df_train.head()

df_train.shape

(260753, 299)

df_test = pd.read_csv(path/"test.csv", low_memory=False)
df_test.head()

df_test.shape

(173836, 298)

y_column = df_train.columns.difference(df_test.columns)

y_column

Index(['QuoteConversion_Flag'], dtype='object')

From this it looks like QuoteConversion_Flag is the value we want to predict. Let's take a look at this

type(df_train.QuoteConversion_Flag)

pandas.core.series.Series

df_train.QuoteConversion_Flag.unique()

array([0, 1])

type(df_train.QuoteConversion_Flag.unique()[0])

numpy.int64

Make this a boolean for the purpose of generating predictions as a binary classification

df_train.QuoteConversion_Flag = df_train.QuoteConversion_Flag.astype(dtype='boolean')

Let's see how the training data outcomes are balanced

df_train.QuoteConversion_Flag.describe()

count     260753
unique         2
top        False
freq      211859
Name: QuoteConversion_Flag, dtype: object

train_data_balance = pd.DataFrame(df_train["QuoteConversion_Flag"]).groupby("QuoteConversion_Flag")

train_data_balance["QuoteConversion_Flag"].describe()

We have about 5 times as many "No Sale" data rows as we do data that shows a successful sale happened. This data bias may have an impact on the effectiveness of our model to predict positive sales results

First things first

Learning from my colleague Tim's work already we know:

Quotenumber is unique so we can make it the index
Original_Quote_Date column should be set as a date type

Additionally, we should make sure to apply any changes to data types to both train and test data so predictions don't fail later on

df_train = df_train.set_index('QuoteNumber')
df_test = df_test.set_index('QuoteNumber')

We may have some NaN values for Original_Quote_Date in either the training or test dataset, but let's confirm there are none.

df_train['Original_Quote_Date'].isna().sum(), df_test['Original_Quote_Date'].isna().sum()

(0, 0)

df_train['Original_Quote_Date'] = pd.to_datetime(df_train['Original_Quote_Date'])
df_test['Original_Quote_Date'] = pd.to_datetime(df_test['Original_Quote_Date'])

Add the date_part to see if this helps improve modeling

df_train = add_datepart(df_train, 'Original_Quote_Date')
df_test = add_datepart(df_test, 'Original_Quote_Date')

Goal: Better model training, refining fastai parameters, using EDA insights gathered to date

y_names = [y_column[0]]
y_names

['QuoteConversion_Flag']

cont_names, cat_names = cont_cat_split(df_train, dep_var=y_names)
len(cont_names), len(cat_names)

(155, 154)

Modifying these lists here based on EDA notebook learnings to date

'Field8' in cont_names

True

'Field9' in cat_names, 'Field9' in cont_names

(False, True)

field9_categories = df_train['Field9'].unique()
df_train['Field9'] = df_train['Field9'].astype('category')
df_train['Field9'].cat.set_categories(field9_categories, inplace=True)

cont_names.remove('Field9')
cat_names.append('Field9')

'Field11' in cat_names, 'Field11' in cont_names

(False, True)

field11_categories = df_train['Field11'].unique()
df_train['Field11'] = df_train['Field11'].astype('category')
df_train['Field11'].cat.set_categories(field11_categories, inplace=True)

cont_names.remove('Field11')
cat_names.append('Field11')

'PropertyField25' in cat_names, 'PropertyField25' in cont_names

(False, True)

propertyfield25_categories = df_train['PropertyField25'].unique()
df_train['PropertyField25'] = df_train['PropertyField25'].astype('category')
df_train['PropertyField25'].cat.set_categories(propertyfield25_categories, inplace=True)

cont_names.remove('PropertyField25')
cat_names.append('PropertyField25')

df_train.drop('PropertyField29', axis=1, inplace=True)
df_train.drop('PersonalField84', axis=1, inplace=True)

cont_names.remove('PersonalField84')
cont_names.remove('PropertyField29')

"QuoteConversion_Flag" in cont_names, "QuoteConversion_Flag" in cat_names #Make sure we've gotten our y-column excluded

(False, False)

procs = [Categorify, FillMissing, Normalize]
splits = TrainTestSplitter(test_size=test_size, stratify=df_train[y_names])(df_train)

to = TabularPandas(df=df_train, procs=procs, cat_names=cat_names, cont_names=cont_names, y_names=y_names,splits=splits)
dls = to.dataloaders(bs=bs, val_bs=val_bs)
dls.valid.show_batch()

len(dls.train)*512, len(dls.valid)*128

(97280, 65280)

roc_auc_binary = RocAucBinary()
learn = tabular_learner(dls, metrics=roc_auc_binary)

type(roc_auc_binary)

fastai.metrics.AccumMetric

learn.lr_find()

SuggestedLRs(valley=tensor(0.0010))

Reference to why we use fit_one_cycle

Note I ran fit_one_cycle with a value of 10 when prepping this notebook for publishing, but the test results came out suspiciously high on 1 outputs, given that the test submission I ran before was heavily weighted with 0 outputs and got a 0.83 score when I ran with 5 but I didn't set the random seed value then. What happened I think was changing the splitter, I got a different tensor output and I was looking at the alternate value column instead column with the prediction value.

learn.fit_one_cycle(epochs, lr, wd=wd)

Referenced another Kaggle notebook for this, we don't need it but it's good to see what fastai metrics is actually packaging up for you

preds, targs = learn.get_preds()

preds[0:1][0][0], preds[0:1][0][1]

(tensor(1.0000), tensor(1.0704e-05))

Here was my mistake, I was looking at the wrong classifier value

len(preds)

65189

(preds[:][:][:,1] >= 0.5).sum(), (preds[:][:][:,1] < 0.5).sum()

(tensor(9910), tensor(55279))

from sklearn.metrics import roc_auc_score
valid_score = roc_auc_score(to_np(targs), to_np(preds[:][:][:,1]))
valid_score

0.9609515205141398

Doing inferences based on this blog post from Walk With Fastai initially, but then experimenting to get this

dl_test = dls.test_dl(df_test)

preds, _ = learn.get_preds(dl=dl_test)

(preds[:][:][:,1] >= 0.5).sum(), (preds[:][:][:,1] < 0.5).sum()

(tensor(26627), tensor(147209))

Submission To Kaggle

path.ls()

(#18) [Path('homesite-quote-conversion.zip'),Path('models'),Path('sample_submission.csv.zip'),Path('test.csv.zip'),Path('train.csv.zip'),Path('train.csv'),Path('test.csv'),Path('sample_submission.csv'),Path('submission.csv'),Path('submission2.csv')...]

file_extract(path/"sample_submission.csv.zip")

path.ls()

(#18) [Path('homesite-quote-conversion.zip'),Path('models'),Path('sample_submission.csv.zip'),Path('test.csv.zip'),Path('train.csv.zip'),Path('train.csv'),Path('test.csv'),Path('sample_submission.csv'),Path('submission.csv'),Path('submission2.csv')...]

df_submission = pd.read_csv(path/"sample_submission.csv") #I could add `low_memory=false` but it makes things slower
df_submission.head()

df_submission.tail()

len(df_test.index), len(preds[:][:][:,1])

(173836, 173836)

type(preds)

torch.Tensor

preds.dtype

torch.float32

preds[0,1]

tensor(0.0010)

preds[:1][:1]

tensor([[0.9990, 0.0010]])

We want the 2nd value, this is what gives us our confidence value if it is going to be a sale

preds[:1][:1][:,1]

tensor([0.0010])

preds_for_submission = preds[:][:][:,1].tolist() 
preds_for_submission[0:3]

[0.0010175276547670364, 0.03169810026884079, 0.025726569816470146]

fpfs = [float(pfs) for pfs in preds_for_submission]
fpfs[0:2]

[0.0010175276547670364, 0.03169810026884079]

integers = [[1], [2], [3]]

strings = [int(integer[0]) for integer in integers]
strings

[1, 2, 3]

submission = pd.DataFrame({'QuoteNumber': df_test.index, 'QuoteConversion_Flag': preds[:][:][:,1].tolist()}, columns=['QuoteNumber', 'QuoteConversion_Flag'])

Needed to figure out how to extract the floating point value alone from the list to properly compose the csv output dataframe

type(submission.QuoteConversion_Flag)

pandas.core.series.Series

type(submission.QuoteConversion_Flag[0])

numpy.float64

submission.QuoteConversion_Flag[0]

0.0010175276547670364

Played around with the example on list comprehension here to get it to work with what I had to work with

submission.QuoteConversion_Flag = [float(qcf) for qcf in submission.QuoteConversion_Flag]

submission.head()

submission.QuoteConversion_Flag = round(submission.QuoteConversion_Flag).astype('Int64')

submission.head()

len(submission[submission.QuoteConversion_Flag==1])

26627

len(submission[submission.QuoteConversion_Flag==0])

147209

submission.to_csv(path/'submission11.csv', index=False)

api.competition_submit(path/'submission11.csv',message="Tenth pass", competition='homesite-quote-conversion')

100%|██████████| 1.45M/1.45M [00:01<00:00, 765kB/s]

Successfully submitted to Homesite Quote Conversion

learn.save('homesite_fastai_nn11')

Path('models/homesite_fastai_nn11.pth')

	QuoteNumber	Original_Quote_Date	Field6	Field7	Field8	Field9	Field10	Field11	Field12	CoverageField1A	CoverageField1B	CoverageField2A	CoverageField2B	CoverageField3A	CoverageField3B	CoverageField4A	CoverageField4B	CoverageField5A	CoverageField5B	CoverageField6A	CoverageField6B	CoverageField8	CoverageField9	CoverageField11A	CoverageField11B	SalesField1A	SalesField1B	SalesField2A	SalesField2B	SalesField3	SalesField4	SalesField5	SalesField6	SalesField7	SalesField8	SalesField9	SalesField11	SalesField12	...	GeographicField44A	GeographicField44B	GeographicField45A	GeographicField45B	GeographicField46A	GeographicField46B	GeographicField47A	GeographicField47B	GeographicField48A	GeographicField48B	GeographicField49A	GeographicField49B	GeographicField50A	GeographicField50B	GeographicField51A	GeographicField51B	GeographicField52A	GeographicField52B	GeographicField53A	GeographicField53B	GeographicField54A	GeographicField54B	GeographicField55A	GeographicField55B	GeographicField56A	GeographicField56B	GeographicField57A	GeographicField57B	GeographicField58A	GeographicField58B	GeographicField59A	GeographicField59B	GeographicField60A	GeographicField60B	GeographicField61A	GeographicField61B	GeographicField62A	GeographicField62B	GeographicField63	GeographicField64
0	1	2013-08-16	B	23	0.9403	0.0006	965	1.0200	N	17	23	17	23	15	22	16	22	13	22	13	23	T	D	2	1	7	18	3	8	0	5	5	24	V	48649	0	0	0	...	8	4	20	22	10	8	6	5	15	13	19	18	16	14	21	23	21	23	16	11	22	24	7	14	-1	17	15	17	14	18	9	9	-1	8	-1	18	-1	10	N	CA
1	2	2014-04-22	F	7	1.0006	0.0040	548	1.2433	N	6	8	6	8	5	7	5	8	13	22	13	23	T	E	5	9	5	14	6	18	1	5	5	11	P	26778	0	1	1	...	23	24	11	15	21	24	6	11	21	21	18	15	20	20	13	12	12	12	15	9	13	11	11	20	-1	9	18	21	8	7	10	10	-1	11	-1	17	-1	20	N	NJ
2	4	2014-08-25	F	7	1.0006	0.0040	548	1.2433	N	7	12	7	12	6	10	7	11	25	25	13	23	T	J	4	6	3	10	4	11	1	5	5	11	K	8751	0	2	2	...	21	22	24	25	20	22	7	13	23	23	20	19	20	20	18	20	19	21	20	19	11	8	3	3	-1	5	21	24	12	15	15	18	-1	21	-1	11	-1	8	N	NJ
3	6	2013-04-15	J	10	0.9769	0.0004	1,165	1.2665	N	3	2	3	2	2	2	3	2	13	22	13	23	Y	F	15	23	8	19	14	24	0	5	5	23	V	43854	0	0	0	...	3	1	14	22	6	2	7	14	11	8	19	18	18	16	13	12	13	12	17	13	5	2	3	4	-1	7	14	14	14	18	6	5	-1	10	-1	9	-1	21	N	TX
4	8	2014-01-25	E	23	0.9472	0.0006	1,487	1.3045	N	8	13	8	13	7	11	7	13	13	22	13	23	T	F	4	6	3	6	3	6	1	5	5	7	R	12505	1	0	0	...	24	25	9	11	25	25	5	3	22	22	21	21	17	15	25	25	25	25	17	13	13	11	3	4	-1	7	11	9	10	10	18	22	-1	10	-1	11	-1	12	N	IL

	QuoteNumber	Original_Quote_Date	Field6	Field7	Field8	Field9	Field10	Field11	Field12	CoverageField1A	CoverageField1B	CoverageField2A	CoverageField2B	CoverageField3A	CoverageField3B	CoverageField4A	CoverageField4B	CoverageField5A	CoverageField5B	CoverageField6A	CoverageField6B	CoverageField8	CoverageField9	CoverageField11A	CoverageField11B	SalesField1A	SalesField1B	SalesField2A	SalesField2B	SalesField3	SalesField4	SalesField5	SalesField6	SalesField7	SalesField8	SalesField9	SalesField10	SalesField11	SalesField12	...	GeographicField44A	GeographicField44B	GeographicField45A	GeographicField45B	GeographicField46A	GeographicField46B	GeographicField47A	GeographicField47B	GeographicField48A	GeographicField48B	GeographicField49A	GeographicField49B	GeographicField50A	GeographicField50B	GeographicField51A	GeographicField51B	GeographicField52A	GeographicField52B	GeographicField53A	GeographicField53B	GeographicField54A	GeographicField54B	GeographicField55A	GeographicField55B	GeographicField56A	GeographicField56B	GeographicField57A	GeographicField57B	GeographicField58A	GeographicField58B	GeographicField59A	GeographicField59B	GeographicField60A	GeographicField60B	GeographicField61A	GeographicField61B	GeographicField62A	GeographicField62B	GeographicField63	GeographicField64
0	3	2014-08-12	E	16	0.9364	0.0006	1,487	1.3045	N	4	4	4	4	3	3	3	4	13	22	13	23	Y	K	13	22	6	16	9	21	0	5	5	11	P	67052	0	0	0	0	...	22	23	9	12	25	25	6	9	4	2	16	12	20	20	2	2	2	1	1	1	10	7	25	25	-1	19	19	22	12	15	1	1	-1	1	-1	20	-1	25	Y	IL
1	5	2013-09-07	F	11	0.9919	0.0038	564	1.1886	N	8	14	8	14	7	12	8	13	13	22	13	23	T	E	4	5	3	6	3	6	1	5	5	4	R	27288	1	0	0	0	...	23	24	12	21	23	25	7	11	16	14	13	6	17	15	7	5	7	5	13	7	14	14	7	14	-1	4	1	1	5	3	10	10	-1	5	-1	5	-1	21	N	NJ
2	7	2013-03-29	F	15	0.8945	0.0038	564	1.0670	N	11	18	11	18	10	16	10	18	13	22	13	23	T	E	3	3	5	14	3	9	1	5	5	23	V	65264	0	1	2	2	...	16	18	9	10	14	16	6	8	20	19	17	14	16	13	20	22	20	22	20	18	10	7	4	7	-1	11	13	12	18	22	10	11	-1	20	-1	22	-1	11	N	NJ
3	9	2015-03-21	K	21	0.8870	0.0004	1,113	1.2665	Y	14	22	15	22	13	20	22	25	13	22	13	23	Y	F	5	9	9	20	5	16	1	5	5	11	R	32725	1	1	1	1	...	11	11	9	10	11	13	15	21	14	12	17	13	10	6	20	22	20	22	19	16	12	11	4	6	-1	13	10	8	5	3	8	8	-1	13	-1	8	-1	21	N	TX
4	10	2014-12-10	B	25	0.9153	0.0007	935	1.0200	N	4	5	4	5	4	4	4	5	13	22	13	23	Y	D	12	21	1	1	3	6	0	5	5	11	T	56025	0	1	1	1	...	9	8	25	25	9	3	9	18	7	4	16	12	13	9	8	6	8	6	11	5	19	21	13	21	-1	23	11	8	5	3	7	7	-1	3	-1	22	-1	21	N	CA

	Field6	Field10	Field12	CoverageField5A	CoverageField5B	CoverageField6A	CoverageField6B	CoverageField8	CoverageField9	SalesField3	SalesField4	SalesField5	SalesField7	SalesField10	SalesField11	SalesField13	SalesField14	SalesField15	PersonalField1	PersonalField2	PersonalField5	PersonalField6	PersonalField7	PersonalField8	PersonalField9	PersonalField12	PersonalField13	PersonalField16	PersonalField17	PersonalField18	PersonalField19	PersonalField22	PersonalField27	PersonalField28	PersonalField29	PersonalField30	PersonalField31	PersonalField32	PersonalField33	PersonalField34	PersonalField43	PersonalField44	PersonalField45	PersonalField46	PersonalField47	PersonalField48	PersonalField53	PersonalField58	PersonalField63	PersonalField68	PersonalField73	PersonalField74	PersonalField75	PersonalField76	PersonalField77	PersonalField78	PersonalField83	PropertyField2A	PropertyField3	PropertyField4	PropertyField5	PropertyField7	PropertyField8	PropertyField10	PropertyField11A	PropertyField11B	PropertyField12	PropertyField13	PropertyField14	PropertyField15	PropertyField17	PropertyField18	PropertyField19	PropertyField22	PropertyField23	PropertyField27	PropertyField28	PropertyField30	PropertyField31	PropertyField32	PropertyField33	PropertyField34	PropertyField35	PropertyField36	PropertyField37	PropertyField38	GeographicField5A	GeographicField5B	GeographicField10A	GeographicField10B	GeographicField14A	GeographicField14B	GeographicField18A	GeographicField21A	GeographicField22A	GeographicField22B	GeographicField23A	GeographicField56A	GeographicField60A	GeographicField61A	GeographicField62A	GeographicField62B	GeographicField63	GeographicField64	Original_Quote_Year	Original_Quote_Month	Original_Quote_Dayofweek	Original_Quote_Is_month_end	Original_Quote_Is_month_start	Original_Quote_Is_quarter_end	Original_Quote_Is_quarter_start	Original_Quote_Is_year_end	Original_Quote_Is_year_start	Field9	Field11	PropertyField25	Field7	Field8	CoverageField1A	CoverageField1B	CoverageField2A	CoverageField2B	CoverageField3A	CoverageField3B	CoverageField4A	CoverageField4B	CoverageField11A	CoverageField11B	SalesField1A	SalesField1B	SalesField2A	SalesField2B	SalesField6	SalesField8	SalesField12	PersonalField4A	PersonalField4B	PersonalField10A	PersonalField10B	PersonalField14	PersonalField15	PropertyField1A	PropertyField1B	PropertyField2B	PropertyField16A	PropertyField16B	PropertyField21A	PropertyField21B	PropertyField24A	PropertyField24B	PropertyField26A	PropertyField26B	PropertyField39A	PropertyField39B	GeographicField1A	GeographicField1B	GeographicField2A	GeographicField2B	GeographicField3A	GeographicField3B	GeographicField4A	GeographicField4B	GeographicField6A	GeographicField6B	GeographicField7A	GeographicField7B	GeographicField8A	GeographicField8B	GeographicField9A	GeographicField9B	GeographicField11A	GeographicField11B	GeographicField12A	GeographicField12B	GeographicField13A	GeographicField13B	GeographicField15A	GeographicField15B	GeographicField16A	GeographicField16B	GeographicField17A	GeographicField17B	GeographicField18B	GeographicField19A	GeographicField19B	GeographicField20A	GeographicField20B	GeographicField21B	GeographicField23B	GeographicField24A	GeographicField24B	GeographicField25A	GeographicField25B	GeographicField26A	GeographicField26B	GeographicField27A	GeographicField27B	GeographicField28A	GeographicField28B	GeographicField29A	GeographicField29B	GeographicField30A	GeographicField30B	GeographicField31A	GeographicField31B	GeographicField32A	GeographicField32B	GeographicField33A	GeographicField33B	GeographicField34A	GeographicField34B	GeographicField35A	GeographicField35B	GeographicField36A	GeographicField36B	GeographicField37A	GeographicField37B	GeographicField38A	GeographicField38B	GeographicField39A	GeographicField39B	GeographicField40A	GeographicField40B	GeographicField41A	GeographicField41B	GeographicField42A	GeographicField42B	GeographicField43A	GeographicField43B	GeographicField44A	GeographicField44B	GeographicField45A	GeographicField45B	GeographicField46A	GeographicField46B	GeographicField47A	GeographicField47B	GeographicField48A	GeographicField48B	GeographicField49A	GeographicField49B	GeographicField50A	GeographicField50B	GeographicField51A	GeographicField51B	GeographicField52A	GeographicField52B	GeographicField53A	GeographicField53B	GeographicField54A	GeographicField54B	GeographicField55A	GeographicField55B	GeographicField56B	GeographicField57A	GeographicField57B	GeographicField58A	GeographicField58B	GeographicField59A	GeographicField59B	GeographicField60B	GeographicField61B	Original_Quote_Week	Original_Quote_Day	Original_Quote_Dayofyear	Original_Quote_Elapsed	QuoteConversion_Flag
0	B	935	N	13	22	13	23	T	D	1	5	5	K	1	1	0	0	0	1	1	7	0	N	1	2	1	2	ZA	ZE	XR	XD	1	1	2	2	0	1	1	1	1	1	0	1	1	1	2	1	1	1	1	1	0	1	1	1	2	1	-1	Y	Y	Y	O	0	1	-1	21	4	3	C	4	1	3	0	2	2	4	B	N	N	Y	G	Y	1	N	N	N	-1	13	-1	25	-1	9	-1	-1	-1	15	-1	-1	-1	-1	-1	12	N	CA	2014	2	0	False	False	False	False	False	False	0.0007	1.02	3	23.0	0.9403	25.000000	25.0	25.0	25.0	22.000000	25.0	24.0	25.0	1.0	1.0	8.0	19.0	2.0	4.0	11.0	42316.000014	1.000000e+00	3.0	3.0	9.0	18.0	4.0	24.0	4.0	6.0	17.0	10.0	19.0	25.000001	25.0	18.0	23.0	7.0	8.0	3.0	2.0	7.0	15.0	11.0	9.0	11.0	13.0	25.0	25.0	2.0	9.0	6.0	7.0	2.0	7.0	2.0	5.0	2.0	9.0	4.0	7.0	2.0	8.0	8.0	7.0	3.0	7.0	10.000000	21.0	9.0	2.0	3.0	3.000000	6.0	4.0	2.0	13.0	8.0	16.0	15.0	16.0	20.0	15.0	11.0	6.0	9.0	6.0	6.0	5.0	4.0	7.000000	8.0	16.0	22.0	14.0	21.0	6.000000	5.0	6.0	9.0	5.0	4.0	7.0	17.0	12.0	19.0	8.000000	16.0	3.0	8.0	2.0	1.0	5.0	3.0	10.0	11.0	15.0	17.0	2.0	4.0	10.0	12.0	21.000000	23.0	13.0	10.0	14.0	7.0	6.0	3.0	19.0	21.0	18.0	20.0	14.0	8.0	21.0	23.0	12.0	20.0	21.0	6.0	3.0	6.0	4.0	9.0	9.0	16.0	12.0	6.999999	10.0	40.999994	1.391990e+09	False
1	B	965	N	13	22	13	23	T	D	1	5	5	Q	1	1	0	0	0	1	1	6	1	N	1	2	1	2	ZA	ZE	XR	XD	1	1	2	2	0	1	1	1	1	1	0	1	1	1	2	1	1	1	1	1	0	1	1	1	2	1	-1	Y	Y	Y	O	0	1	-1	21	4	2	C	4	1	3	0	2	4	10	B	N	N	Y	E	Y	1	N	N	N	-1	13	-1	25	-1	8	-1	-1	-1	15	-1	-1	-1	-1	-1	24	N	CA	2013	11	3	False	False	False	False	False	False	0.0006	1.02	1	23.0	0.9403	23.999999	25.0	24.0	25.0	21.000000	24.0	23.0	25.0	1.0	1.0	12.0	22.0	4.0	13.0	7.0	59943.999147	1.000000e+00	19.0	23.0	16.0	21.0	4.0	24.0	17.0	22.0	4.0	9.0	18.0	24.000000	25.0	17.0	22.0	25.0	25.0	3.0	2.0	11.0	21.0	12.0	11.0	4.0	3.0	11.0	10.0	2.0	10.0	2.0	3.0	2.0	4.0	2.0	7.0	2.0	10.0	2.0	4.0	2.0	7.0	2.0	2.0	2.0	2.0	21.000001	25.0	24.0	2.0	4.0	3.000000	7.0	4.0	5.0	13.0	8.0	16.0	15.0	16.0	20.0	15.0	11.0	13.0	20.0	20.0	24.0	11.0	20.0	16.000000	23.0	7.0	11.0	20.0	24.0	13.000000	17.0	17.0	24.0	14.0	22.0	10.0	21.0	10.0	15.0	8.000000	15.0	3.0	8.0	1.0	1.0	7.0	8.0	7.0	6.0	15.0	16.0	2.0	2.0	9.0	4.0	7.000000	14.0	8.0	5.0	12.0	5.0	12.0	8.0	4.0	3.0	4.0	3.0	9.0	4.0	12.0	10.0	18.0	23.0	21.0	16.0	19.0	16.0	21.0	5.0	3.0	6.0	17.0	46.000001	14.0	317.999998	1.384387e+09	False
2	B	965	N	13	22	13	23	X	D	1	5	5	Q	0	0	0	0	0	1	1	7	0	N	1	2	3	2	ZA	ZE	XR	XD	1	1	2	2	0	1	1	1	1	1	0	1	1	1	2	1	1	1	1	1	0	1	1	1	2	1	-1	N	N	Y	O	1	2	-1	21	4	2	C	1	1	2	0	2	4	10	B	N	N	Y	H	N	1	N	N	N	-1	13	-1	25	-1	7	-1	-1	-1	15	-1	-1	-1	-1	-1	12	N	CA	2013	11	1	False	False	False	False	False	False	0.0006	1.02	1	25.0	0.9403	14.000000	21.0	14.0	21.0	12.000000	19.0	13.0	20.0	14.0	22.0	4.0	11.0	2.0	3.0	20.0	3553.999261	-3.226606e-08	6.0	8.0	5.0	4.0	5.0	24.0	6.0	9.0	22.0	4.0	9.0	14.000000	21.0	9.0	14.0	16.0	22.0	4.0	2.0	7.0	14.0	8.0	5.0	8.0	8.0	20.0	22.0	2.0	4.0	2.0	2.0	2.0	2.0	2.0	3.0	2.0	7.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	18.000000	24.0	21.0	2.0	2.0	2.000000	3.0	11.0	5.0	15.0	11.0	19.0	20.0	20.0	22.0	17.0	16.0	6.0	10.0	11.0	17.0	23.0	25.0	3.000000	2.0	13.0	21.0	19.0	24.0	6.000000	5.0	24.0	25.0	6.0	6.0	7.0	17.0	18.0	24.0	8.000000	16.0	2.0	3.0	3.0	4.0	4.0	3.0	8.0	8.0	10.0	9.0	3.0	5.0	10.0	8.0	25.000001	25.0	8.0	5.0	14.0	8.0	12.0	7.0	12.0	11.0	11.0	10.0	7.0	3.0	20.0	22.0	16.0	23.0	19.0	9.0	6.0	22.0	24.0	6.0	5.0	7.0	17.0	44.999999	5.0	308.999998	1.383610e+09	True
3	C	1,487	Y	13	22	25	25	V	A	1	5	5	P	0	0	0	0	0	1	1	7	0	N	1	2	1	2	XR	XS	YP	XC	1	0	1	1	0	0	0	0	1	1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	1	1	-1	N	N	Y	A	1	1	-1	25	1	3	C	5	1	3	1	2	13	7	B	Y	K	N	E	N	0	N	N	N	-1	13	-1	25	-1	20	-1	25	-1	23	-1	-1	-1	-1	-1	8	N	IL	2015	5	6	False	False	False	False	False	False	0.0006	1.3045	2	17.0	0.8746	25.000000	25.0	25.0	25.0	25.000001	25.0	25.0	25.0	12.0	21.0	7.0	18.0	1.0	1.0	11.0	7220.998942	-3.226606e-08	16.0	20.0	8.0	18.0	3.0	4.0	6.0	9.0	23.0	12.0	21.0	25.000001	25.0	23.0	25.0	16.0	21.0	2.0	1.0	3.0	7.0	16.0	16.0	24.0	25.0	19.0	21.0	10.0	18.0	12.0	17.0	10.0	18.0	18.0	21.0	9.0	18.0	13.0	17.0	9.0	18.0	10.0	12.0	9.0	18.0	2.000000	9.0	19.0	20.0	19.0	5.000000	16.0	25.0	16.0	5.0	2.0	5.0	2.0	3.0	3.0	6.0	2.0	4.0	4.0	7.0	8.0	7.0	10.0	7.000000	8.0	10.0	17.0	13.0	20.0	10.000000	12.0	14.0	22.0	9.0	13.0	5.0	12.0	7.0	8.0	6.000000	11.0	5.0	10.0	16.0	14.0	9.0	10.0	10.0	13.0	19.0	20.0	10.0	13.0	13.0	15.0	6.000000	7.0	20.0	20.0	20.0	20.0	17.0	15.0	21.0	23.0	21.0	23.0	20.0	19.0	19.0	21.0	5.0	9.0	9.0	22.0	24.0	14.0	17.0	13.0	16.0	24.0	2.0	19.000000	10.0	130.000000	1.431216e+09	False
4	F	548	N	13	22	13	23	T	E	1	5	5	Q	0	0	0	0	0	1	1	7	0	N	1	2	1	2	XH	XG	XF	ZT	1	0	1	1	0	0	0	0	1	1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	1	1	-1	N	N	Y	R	1	1	-1	21	2	2	C	4	0	1	0	2	1	10	B	N	O	Y	H	Y	1	N	N	N	-1	22	-1	25	-1	18	-1	-1	-1	16	-1	-1	-1	-1	-1	17	N	NJ	2014	3	3	False	False	False	False	False	False	0.004	1.1886	1	11.0	0.9566	3.000000	2.0	3.0	3.0	2.000000	2.0	3.0	2.0	7.0	14.0	4.0	11.0	7.0	20.0	11.0	64502.998849	-3.226606e-08	8.0	10.0	11.0	19.0	5.0	10.0	8.0	12.0	15.0	4.0	6.0	3.000000	2.0	3.0	2.0	7.0	9.0	12.0	13.0	3.0	7.0	8.0	6.0	8.0	8.0	10.0	9.0	5.0	15.0	11.0	16.0	5.0	15.0	6.0	12.0	4.0	10.0	10.0	16.0	4.0	10.0	12.0	19.0	8.0	17.0	2.000000	8.0	11.0	24.0	25.0	25.000001	25.0	16.0	18.0	25.0	25.0	24.0	25.0	16.0	18.0	25.0	25.0	1.0	1.0	4.0	3.0	5.0	5.0	8.000000	12.0	4.0	4.0	5.0	5.0	7.000000	6.0	11.0	19.0	5.0	4.0	6.0	15.0	3.0	2.0	7.000000	13.0	9.0	14.0	24.0	24.0	12.0	15.0	10.0	13.0	12.0	12.0	8.0	8.0	10.0	11.0	7.000000	11.0	10.0	7.0	5.0	2.0	3.0	1.0	12.0	11.0	10.0	9.0	6.0	2.0	20.0	22.0	12.0	20.0	18.0	9.0	5.0	8.0	7.0	13.0	16.0	5.0	11.0	11.000000	13.0	71.999999	1.394669e+09	False
5	B	965	N	13	22	13	23	T	D	0	5	5	T	0	1	0	0	0	1	1	7	0	N	1	3	5	2	ZA	ZE	XR	XD	1	1	2	2	0	1	1	1	1	1	0	1	1	1	2	1	1	1	1	1	0	1	1	1	2	1	-1	N	N	Y	O	1	1	-1	21	2	1	A	1	0	1	0	2	1	10	B	N	O	Y	G	Y	2	N	Y	N	-1	13	-1	25	-1	7	-1	-1	-1	15	-1	-1	-1	-1	-1	19	N	CA	2013	9	3	False	False	False	False	False	False	0.0006	1.02	1	24.0	0.9403	1.000000	1.0	1.0	1.0	1.000000	1.0	1.0	1.0	10.0	20.0	2.0	1.0	5.0	17.0	4.0	37300.000122	1.000000e+00	1.0	1.0	-1.0	-1.0	4.0	24.0	19.0	23.0	15.0	1.0	1.0	1.000000	1.0	1.0	1.0	4.0	4.0	12.0	15.0	9.0	18.0	10.0	8.0	2.0	1.0	17.0	19.0	2.0	3.0	7.0	8.0	2.0	8.0	2.0	5.0	2.0	4.0	5.0	8.0	2.0	8.0	9.0	12.0	3.0	8.0	2.000000	16.0	9.0	4.0	8.0	6.000000	16.0	4.0	9.0	25.0	25.0	25.0	25.0	25.0	25.0	24.0	25.0	9.0	16.0	10.0	14.0	13.0	23.0	7.000000	10.0	6.0	9.0	7.0	8.0	16.000000	21.0	2.0	2.0	14.0	21.0	7.0	16.0	13.0	21.0	2.000000	2.0	9.0	14.0	18.0	17.0	10.0	11.0	6.0	6.0	12.0	12.0	25.0	25.0	10.0	9.0	5.000000	3.0	8.0	5.0	12.0	5.0	12.0	8.0	7.0	5.0	7.0	5.0	13.0	7.0	16.0	16.0	11.0	19.0	16.0	15.0	17.0	1.0	1.0	5.0	4.0	3.0	24.0	37.000000	12.0	255.000001	1.378944e+09	False
6	B	965	N	13	22	13	23	T	D	1	5	5	T	0	0	0	0	0	1	1	7	0	N	1	2	1	2	ZA	ZE	XR	XD	1	1	2	3	1	1	1	1	1	1	1	1	1	1	3	1	1	1	1	1	1	1	1	1	3	1	-1	N	N	Y	O	0	1	-1	21	2	2	C	1	1	2	0	2	1	10	B	N	N	Y	G	Y	1	N	N	N	-1	13	-1	25	-1	8	-1	-1	-1	15	-1	-1	25	-1	-1	9	N	CA	2013	3	2	False	False	False	False	False	False	0.0006	1.02	1	23.0	0.9403	9.000000	15.0	9.0	15.0	8.000000	13.0	8.0	14.0	3.0	5.0	6.0	17.0	5.0	17.0	23.0	48178.000355	-3.226606e-08	11.0	14.0	8.0	17.0	1.0	24.0	15.0	20.0	8.0	15.0	23.0	9.000000	15.0	9.0	14.0	16.0	22.0	10.0	11.0	3.0	6.0	23.0	24.0	12.0	14.0	11.0	9.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	3.0	2.0	4.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	2.0	14.000000	23.0	22.0	2.0	4.0	3.000000	8.0	11.0	3.0	15.0	11.0	19.0	20.0	20.0	22.0	17.0	16.0	18.0	23.0	25.0	25.0	7.0	9.0	20.999999	24.0	19.0	24.0	25.0	25.0	23.000001	25.0	25.0	25.0	13.0	21.0	11.0	22.0	13.0	20.0	21.999999	25.0	2.0	3.0	3.0	5.0	8.0	9.0	14.0	22.0	14.0	15.0	2.0	3.0	8.0	3.0	24.000000	25.0	23.0	23.0	22.0	23.0	23.0	24.0	17.0	18.0	18.0	19.0	22.0	21.0	14.0	13.0	3.0	5.0	4.0	10.0	7.0	12.0	14.0	17.0	21.0	25.0	4.0	11.000000	13.0	71.999999	1.363133e+09	False
7	J	1,113	N	13	22	13	23	Y	K	1	5	5	Q	1	1	1	1	1	1	1	6	1	N	1	2	1	2	ZA	ZE	XR	XD	1	1	2	2	0	0	1	1	1	1	0	0	1	1	2	1	1	1	1	1	0	0	1	1	2	1	-1	Y	Y	Y	J	0	1	-1	21	4	2	C	2	0	2	0	2	1	10	B	N	N	N	G	N	0	N	N	N	-1	23	-1	25	-1	10	-1	-1	-1	15	-1	-1	-1	-1	-1	16	N	TX	2014	7	3	False	False	False	False	False	False	0.0004	1.2665	1	23.0	0.8928	6.000000	8.0	6.0	8.0	5.000000	7.0	5.0	8.0	10.0	19.0	19.0	24.0	23.0	25.0	11.0	60277.000961	1.000000e+00	7.0	9.0	25.0	25.0	4.0	24.0	6.0	8.0	3.0	3.0	3.0	6.000000	8.0	8.0	12.0	14.0	20.0	15.0	19.0	4.0	7.0	6.0	3.0	10.0	11.0	14.0	15.0	3.0	10.0	23.0	23.0	6.0	16.0	6.0	14.0	3.0	10.0	23.0	22.0	5.0	16.0	24.0	23.0	7.0	15.0	2.000000	9.0	5.0	6.0	9.0	7.000000	18.0	5.0	10.0	21.0	23.0	19.0	21.0	10.0	13.0	24.0	25.0	6.0	10.0	3.0	2.0	3.0	2.0	2.000000	2.0	11.0	18.0	6.0	7.0	6.000000	4.0	4.0	3.0	3.0	2.0	6.0	13.0	2.0	1.0	2.000000	1.0	16.0	23.0	19.0	18.0	6.0	5.0	15.0	22.0	9.0	6.0	1.0	1.0	20.0	22.0	6.000000	5.0	9.0	6.0	21.0	21.0	19.0	19.0	10.0	9.0	11.0	10.0	24.0	25.0	5.0	3.0	6.0	12.0	20.0	17.0	20.0	18.0	22.0	7.0	7.0	9.0	4.0	29.000000	17.0	198.000000	1.405555e+09	False
8	B	935	N	13	22	1	6	T	J	0	4	3	P	0	0	0	0	0	0	0	6	1	N	1	2	3	2	ZA	ZE	XR	XD	1	1	2	2	0	1	1	1	1	1	0	1	1	1	2	1	1	1	1	1	0	1	1	1	2	1	-1	N	N	Y	O	1	1	-1	21	2	2	A	4	1	3	0	2	1	10	B	N	N	Y	G	Y	1	N	N	N	-1	13	-1	25	-1	7	-1	-1	-1	15	-1	-1	-1	-1	-1	16	N	CA	2014	1	1	False	False	False	False	False	False	0.0007	1.02	1	25.0	0.9403	7.000000	11.0	7.0	11.0	6.000000	9.0	6.0	10.0	4.0	7.0	5.0	16.0	6.0	18.0	20.0	16438.000261	-3.226606e-08	16.0	20.0	4.0	3.0	1.0	24.0	16.0	21.0	20.0	9.0	18.0	7.000000	11.0	7.0	10.0	13.0	19.0	10.0	12.0	12.0	22.0	12.0	11.0	4.0	3.0	24.0	25.0	2.0	3.0	4.0	5.0	2.0	5.0	2.0	2.0	2.0	6.0	3.0	6.0	2.0	6.0	5.0	5.0	2.0	5.0	15.000000	23.0	24.0	2.0	2.0	2.000000	4.0	7.0	2.0	13.0	8.0	14.0	11.0	11.0	16.0	14.0	9.0	11.0	19.0	10.0	15.0	10.0	18.0	20.999999	24.0	8.0	13.0	12.0	19.0	14.000000	18.0	15.0	23.0	17.0	23.0	5.0	12.0	12.0	18.0	4.000000	6.0	3.0	7.0	2.0	3.0	6.0	6.0	8.0	8.0	14.0	16.0	2.0	4.0	10.0	10.0	21.000000	23.0	13.0	10.0	16.0	11.0	16.0	14.0	9.0	7.0	9.0	7.0	14.0	9.0	17.0	18.0	13.0	21.0	21.0	13.0	12.0	20.0	24.0	1.0	1.0	6.0	6.0	2.999999	14.0	14.000005	1.389658e+09	False
9	E	1,480	N	13	22	13	23	T	F	1	5	5	P	1	1	0	0	0	1	1	7	0	N	1	2	1	2	ZG	ZF	ZN	ZQ	1	0	1	1	0	0	0	0	1	1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	1	1	-1	N	N	Y	R	1	1	-1	23	3	2	C	4	0	2	1	2	1	14	A	N	K	N	E	N	0	N	N	N	-1	13	-1	25	-1	18	-1	-1	25	25	-1	-1	-1	-1	-1	18	N	IL	2013	8	1	False	False	False	False	False	False	0.0006	1.3045	2	14.0	0.9487	8.000000	14.0	2.0	2.0	2.000000	2.0	8.0	13.0	4.0	6.0	6.0	17.0	6.0	18.0	4.0	49660.999846	1.000000e+00	7.0	9.0	5.0	6.0	4.0	11.0	6.0	8.0	3.0	3.0	5.0	8.000000	14.0	7.0	9.0	4.0	4.0	8.0	7.0	5.0	10.0	14.0	13.0	10.0	11.0	21.0	23.0	9.0	17.0	11.0	17.0	9.0	17.0	14.0	18.0	9.0	17.0	13.0	17.0	9.0	17.0	9.0	12.0	7.0	16.0	2.000000	9.0	16.0	20.0	19.0	4.000000	9.0	24.0	17.0	1.0	1.0	1.0	1.0	1.0	1.0	2.0	1.0	6.0	8.0	4.0	3.0	9.0	15.0	9.000000	13.0	9.0	16.0	11.0	17.0	15.000000	20.0	9.0	16.0	11.0	17.0	8.0	18.0	9.0	13.0	4.000000	7.0	12.0	22.0	18.0	16.0	19.0	23.0	14.0	22.0	22.0	23.0	8.0	8.0	16.0	18.0	6.000000	8.0	17.0	16.0	17.0	13.0	12.0	8.0	20.0	22.0	19.0	21.0	17.0	13.0	10.0	7.0	6.0	13.0	6.0	17.0	19.0	15.0	19.0	9.0	9.0	14.0	23.0	33.000000	13.0	225.000001	1.376352e+09	False

epoch	train_loss	valid_loss	roc_auc_score	time
0	0.248215	0.201386	0.949011	00:13
1	0.196663	0.190094	0.955587	00:13
2	0.185024	0.185835	0.957160	00:13
3	0.181798	0.182034	0.958874	00:13
4	0.175761	0.179600	0.960017	00:13
5	0.168771	0.177265	0.961372	00:13
6	0.158945	0.179190	0.960952	00:13

	QuoteNumber	QuoteConversion_Flag
173831	434570	0
173832	434573	0
173833	434574	0
173834	434575	0
173835	434589	0

	count	unique	top	freq
QuoteConversion_Flag
False	211859	1	False	211859
True	48894	1	True	48894

	QuoteNumber	QuoteConversion_Flag
0	3	0.001018
1	5	0.031698
2	7	0.025727
3	9	0.000540
4	10	0.429832