Introduction

Here I will do some exploratory data analysis of Homesite Data from the Kaggle Competition using Google Colab

Setup fastai and Google drive

!pip install -Uqq fastai

from fastai.tabular.all import *

The snippet below is only useful in Colab for accessing my Google Drive and is straight out the fastbook source code in Github

global gdrive
gdrive = Path('/content/gdrive/My Drive')
from google.colab import drive
if not gdrive.exists(): drive.mount(str(gdrive.parent))

Add the Kaggle bits

!pip install kaggle

Requirement already satisfied: kaggle in /usr/local/lib/python3.7/dist-packages (1.5.12)
Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle) (5.0.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle) (4.41.1)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.23.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle) (2.8.1)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.24.3)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle) (2021.5.30)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle) (1.15.0)
Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle) (1.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle) (2.10)

!ls /content/gdrive/MyDrive/Kaggle/kaggle.json

/content/gdrive/MyDrive/Kaggle/kaggle.json

Useful links here:

Documentation on Path library
Documentation on fastai extensions to Path library

Path.cwd()

Path('/content')

Setup kaggle environment

!mkdir -p ~/.kaggle
!cp /content/gdrive/MyDrive/Kaggle/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

from kaggle import api

path = Path.cwd()
path.ls()

(#3) [Path('/content/.config'),Path('/content/sample_data'),Path('/content/gdrive')]

path = path/"gdrive/MyDrive/Kaggle/homesite_data"
path.mkdir(exist_ok=True)
Path.BASE_PATH = path
api.competition_download_cli('homesite-quote-conversion', path=path)
file_extract(path/"homesite-quote-conversion.zip")
file_extract(path/"train.csv.zip")
file_extract(path/"test.csv.zip")

homesite-quote-conversion.zip: Skipping, found more recently modified local copy (use --force to force download)

path

Path('.')

path.ls()

(#6) [Path('homesite-quote-conversion.zip'),Path('sample_submission.csv.zip'),Path('test.csv.zip'),Path('train.csv.zip'),Path('train.csv'),Path('test.csv')]

Exploring the Homesite data

df_train = pd.read_csv(path/"train.csv", low_memory=False)
df_train.head()

df_test = pd.read_csv(path/"test.csv", low_memory=False)
df_test.head()

y_column = df_train.columns.difference(df_test.columns)

y_column

Index(['QuoteConversion_Flag'], dtype='object')

From this it looks like QuoteConversion_Flag is the value we want to predict. Let's take a look at this

type(df_train.QuoteConversion_Flag)

pandas.core.series.Series

df_train.QuoteConversion_Flag.unique()

array([0, 1])

type(df_train.QuoteConversion_Flag.unique()[0])

numpy.int64

What's interesting here is that the training data outcomes that we want to generate predictions on seems to just be a binary classification, either there's a sale (1) or there's no sale (0). With maths and rounding up and down, predictions above 50% will be rounded up to a sale and predictions less than 50% will be classified as predicting no sale if we keep this field as an integer.

In reality, 50% might not be a good threshold to keep and we may want to vary this rounding value of confidence. If we convert it to a floating point, we can have more control over this threshold, and thus the final prediction, so we can predict a sale if there's say, more than 90% confidence in the prediction.

df_train.QuoteConversion_Flag = df_train.QuoteConversion_Flag.astype(dtype='float64')

First things first

Learning from my collague Tim's work already we know:

Quotenumber is unique so make it the index
Original_Quote_Date should be made into a date

Additionally, we should make sure to apply any changes to data types to both train and test data so predictions don't fail later on

df_train = df_train.set_index('QuoteNumber')
df_test = df_test.set_index('QuoteNumber')

df_train['Original_Quote_Date'] = pd.to_datetime(df_train['Original_Quote_Date'])
df_test['Original_Quote_Date'] = pd.to_datetime(df_train['Original_Quote_Date'])

Goal: Identifying our categorical and continuous data columns

categorical=L()
continuous=L()
notused =L()

Dividing up the data into manageable chunks for exploration

Let's now see how many columns we are working with

df_train.columns

Index(['Original_Quote_Date', 'QuoteConversion_Flag', 'Field6', 'Field7',
       'Field8', 'Field9', 'Field10', 'Field11', 'Field12', 'CoverageField1A',
       ...
       'GeographicField59A', 'GeographicField59B', 'GeographicField60A',
       'GeographicField60B', 'GeographicField61A', 'GeographicField61B',
       'GeographicField62A', 'GeographicField62B', 'GeographicField63',
       'GeographicField64'],
      dtype='object', length=298)

df_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 260753 entries, 1 to 434588
Columns: 298 entries, Original_Quote_Date to GeographicField64
dtypes: datetime64[ns](1), float64(7), int64(263), object(27)
memory usage: 604.8+ MB

298 columns is alot, let's split these by type

df_train.dtypes.unique()

array([dtype('<M8[ns]'), dtype('float64'), dtype('O'), dtype('int64')],
      dtype=object)

Exploring the float64 data

Let's take a look at the smallest subset, the floating point columns, to see if there's anything interesting about them

(df_train[df_train.columns[df_train.dtypes=='float64']].drop(columns=['QuoteConversion_Flag'])).head()

Referencing this article for some good tips on EDA (Exploratory Data Analysis)

(df_train[df_train.columns[df_train.dtypes=='float64']].drop(columns=['QuoteConversion_Flag'])).describe()

Field8 analysis

df_train['Field8'].value_counts()

0.9403    48161
0.9153    44198
0.8870    18476
1.0006    18371
0.9919    16845
0.8928    13945
0.8922    10982
0.9691     8237
0.9392     7235
1.0101     6927
0.9838     5178
0.9108     4827
0.9313     4816
1.0005     3436
0.9769     3329
0.9497     3241
0.9472     3175
0.9566     3065
0.9559     2824
0.8793     2697
0.9685     2558
0.9219     2366
0.9368     2238
0.9223     2210
0.8945     2166
0.9023     2036
0.9487     2012
0.9525     1972
0.9482     1835
0.8746     1806
0.9893     1685
0.9258     1280
0.8930     1213
0.9194     1205
0.9364     1189
0.9375     1108
0.9485      969
0.9489      940
Name: Field8, dtype: int64

df_train['Field8'].isna().sum()

0

df_train["Field8"].plot.hist(bins=200)

<matplotlib.axes._subplots.AxesSubplot at 0x7fe5588c0510>

Field8 looks like a good candidate as a continuous field. Let's add it to that list

continuous.append('Field8')

continuous

(#1) ['Field8']

Field 9 analysis

df_train["Field9"].value_counts()

0.0006    74096
0.0004    65011
0.0007    52353
0.0040    39065
0.0038    30228
Name: Field9, dtype: int64

df_train["Field9"].isna().sum()

0

df_train["Field9"].plot.hist(bins=50)

<matplotlib.axes._subplots.AxesSubplot at 0x7fe558603b50>

Even though it is a floating point field, it looks very categorical in nature rather than continuous. We could peak at the test dataset to confirm this

df_test[~df_test["Field9"].isin(df_train["Field9"].unique())].value_counts()

Series([], dtype: int64)

So this hypothesis holds true, so we're going with the assumption that Field9 is categorical not continuous

categorical.append("Field9")

df_train["Field9"].unique()

array([0.0006, 0.004 , 0.0004, 0.0007, 0.0038])

Field11

df_train["Field11"].value_counts()

1.0200    92359
1.2665    62187
1.3045    28378
1.1886    27646
1.2433    23492
1.2694     6927
1.0670     6412
1.1161     4816
1.2714     3346
1.2392     2824
1.0000     2366
Name: Field11, dtype: int64

df_train["Field11"].isna().sum()

0

df_train["Field11"].plot.hist(bins=200)

<matplotlib.axes._subplots.AxesSubplot at 0x7fe557ffedd0>

Like Field9 this too also looks more categorical than continuous in nature. We can help reassure this hypothesis by again "peeking" to see if any test data values don't confirm to this theory

df_test[~df_test["Field11"].isin(df_train["Field11"].unique())].value_counts()

Series([], dtype: int64)

Once again, the hypothesis holds true, so let's add Field11 to our list of categorical fields

categorical.append("Field11")

PersonalField84

df_train["PersonalField84"].value_counts()

2.0    134906
1.0      1564
5.0        62
3.0         7
7.0         3
4.0         2
8.0         1
Name: PersonalField84, dtype: int64

df_train["PersonalField84"].isna().sum()

124208

df_train["PersonalField84"].plot.hist(bins=100)

<matplotlib.axes._subplots.AxesSubplot at 0x7fe557d29990>

This one also seems a bit categorical. But there are quite a few NA values. We have a few choices here.

Given the heavy skewing towards one value, it would arguably be a good assumption to fill this with the mode value and not a mean value.
Given that a significant number of NA is present, we could ignore it totally as part of our model, and revisit it after if we're looking to fine tune our input dataset
For now, let's ignore it

notused.append("PersonalField84")

PropertyField25

df_train["PropertyField25"].value_counts()

1.0    121563
2.0     91666
3.0     27427
1.5     16003
2.5      2173
5.0      1045
4.0       531
3.5       156
7.0       119
6.0        53
0.0        17
Name: PropertyField25, dtype: int64

df_train["PropertyField25"].isna().sum()

0

df_train["PropertyField25"].plot.hist(bins=100)

<matplotlib.axes._subplots.AxesSubplot at 0x7fe557d2b910>

This also looks very much like a categorical field and not a continuous field. "Peeking" once more at our test data to help the belief we're not making a bad assumption here

df_test[~df_test["PropertyField25"].isin(df_train["PropertyField25"].unique())].value_counts()

Series([], dtype: int64)

Once again, the hypothesis holds true, so let's add PropertyField25 to our list of categorical fields

categorical.append("PropertyField25")

PropertyField29

PropertyField29 looks like it might be a boolean. Let's test this

df_train['PropertyField29'].value_counts()

0.0    60056
1.0       12
Name: PropertyField29, dtype: int64

df_train['PropertyField29'].isna().sum()

200685

This one also seems a bit categorical. But there are quite a few NA values. We have a few choices here.

Given the heavy skewing towards one value, it would arguably be a good assumption to fill this with the mode value and not a mean value.
Given that a significant number of NA is present, we could ignore it totally as part of our model, and revisit it after if we're looking to fine tune our input dataset
For now, let's ignore it

notused.append("PropertyField29")

Experimenting with correlation bits

What if we decided instead we wanted to include this data, with assumptions for values in NA, as part of our model?

How do we fill the NA values?

Should we use 0 or should we use 1?

Let's first look at if it correlates with any other field that might give us insight about how to treat with it

Let's have a look at the ranges in each of these columns, to identify if they're continuous or categorical in nature. Reference

correlations = df_train.corr()['PropertyField29'][:-1].sort_values(kind="quicksort")
print(correlations)

Field7               -0.739375
PersonalField15      -0.041584
PersonalField33      -0.033010
PersonalField32      -0.025723
SalesField4          -0.023916
                        ...   
PersonalField72            NaN
PersonalField73            NaN
PropertyField6             NaN
GeographicField10A         NaN
GeographicField10B         NaN
Name: PropertyField29, Length: 269, dtype: float64

df_train[['PropertyField29','Field7']].head()

df_train[['PropertyField29','Field7']].plot.scatter(x='Field7',
...                       y='PropertyField29',
...                       c='DarkBlue')

<matplotlib.axes._subplots.AxesSubplot at 0x7f526fa0b990>

Status check

categorical, continuous, notused

((#3) ['Field9','Field11','PropertyField25'],
 (#1) ['Field8'],
 (#2) ['PersonalField84','PropertyField29'])

Int64 analysis

df_train[df_train.columns[df_train.dtypes=='int64']].head()

Object analysis

df[df.columns[df.dtypes=='object']].head()

There are a lot of fields, let's make sure those that look like they should be boolean columns are set as boolean data types

Experiment with correlation (Work In Progress)

Using this article about correlation, let's see if there's any obvious correlations in our data columns

correlations = df_train.corr()
print(correlations)

                      QuoteConversion_Flag  ...  GeographicField62B
QuoteConversion_Flag              1.000000  ...           -0.017524
Field7                           -0.137532  ...            0.077039
Field8                            0.127039  ...           -0.092827
Field9                            0.179569  ...           -0.126171
Field11                          -0.100376  ...            0.000361
...                                    ...  ...                 ...
GeographicField60B                0.012505  ...           -0.185878
GeographicField61A                0.004799  ...            0.083706
GeographicField61B                0.038513  ...            0.112729
GeographicField62A               -0.003142  ...            0.300912
GeographicField62B               -0.017524  ...            1.000000

[270 rows x 270 columns]

Need to filter this a little better. Borrowing from StackOverflow article to get the highest correlated values

df_train.shape

(260753, 298)

c = correlations.abs()

s = c.unstack()
so = s.sort_values(kind="quicksort")

print(so[0:10])

GeographicField35A  PropertyField20       1.279033e-07
PropertyField20     GeographicField35A    1.279033e-07
GeographicField13A  PropertyField20       8.859665e-07
PropertyField20     GeographicField13A    8.859665e-07
PersonalField63     GeographicField51A    1.474951e-06
GeographicField51A  PersonalField63       1.474951e-06
PersonalField22     GeographicField17B    1.733656e-06
GeographicField17B  PersonalField22       1.733656e-06
PersonalField30     PersonalField38       1.945838e-06
PersonalField38     PersonalField30       1.945838e-06
dtype: float64

Need to filter this a little better. Borrowing from StackOverflow article to get the highest correlated values

Borrowing from seaborn documentation to make this a little more visual

from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(so, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(so, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

	Field8	Field9	Field11	PersonalField84	PropertyField25	PropertyField29
count	260753.000000	260753.000000	260753.000000	136545.000000	260753.000000	60068.000000
mean	0.938346	0.001451	1.162718	1.990142	1.632422	0.000200
std	0.037086	0.001486	0.116833	0.127931	0.717064	0.014133
min	0.874600	0.000400	1.000000	1.000000	0.000000	0.000000
25%	0.915300	0.000600	1.020000	2.000000	1.000000	0.000000
50%	0.940300	0.000600	1.188600	2.000000	1.500000	0.000000
75%	0.968500	0.003800	1.266500	2.000000	2.000000	0.000000
max	1.010100	0.004000	1.304500	8.000000	7.000000	1.000000

	Field6	Field10	Field12	CoverageField8	CoverageField9	SalesField7	PersonalField7	PersonalField16	PersonalField17	PersonalField18	PersonalField19	PropertyField3	PropertyField4	PropertyField5	PropertyField7	PropertyField14	PropertyField28	PropertyField30	PropertyField31	PropertyField32	PropertyField33	PropertyField34	PropertyField36	PropertyField37	PropertyField38	GeographicField63	GeographicField64
0	B	965	N	T	D	V	N	ZA	ZE	XR	XD	N	N	Y	O	C	B	N	N	Y	G	Y	N	N	N	N	CA
1	F	548	N	T	E	P	N	XB	YJ	YE	XT	N	N	Y	N	B	B	N	O	N	H	Y	N	N	N	N	NJ
2	F	548	N	T	J	K	N	ZH	XS	YP	XC	Y	Y	Y	R	C	B	N	K	Y	H	Y	N	N	N	N	NJ
3	J	1,165	N	Y	F	V	N	XO	XE	YI	XX	N	N	Y	R	C	B	N	O	Y	G	N	N	Y	N	N	TX
4	E	1,487	N	T	F	R	N	ZA	ZE	XR	XD	Y	Y	Y	D	A	B	N	O	N	H	N	N	N	N	N	IL

	QuoteNumber	Original_Quote_Date	Field6	Field7	Field8	Field9	Field10	Field11	Field12	CoverageField1A	CoverageField1B	CoverageField2A	CoverageField2B	CoverageField3A	CoverageField3B	CoverageField4A	CoverageField4B	CoverageField5A	CoverageField5B	CoverageField6A	CoverageField6B	CoverageField8	CoverageField9	CoverageField11A	CoverageField11B	SalesField1A	SalesField1B	SalesField2A	SalesField2B	SalesField3	SalesField4	SalesField5	SalesField6	SalesField7	SalesField8	SalesField9	SalesField11	SalesField12	...	GeographicField44A	GeographicField44B	GeographicField45A	GeographicField45B	GeographicField46A	GeographicField46B	GeographicField47A	GeographicField47B	GeographicField48A	GeographicField48B	GeographicField49A	GeographicField49B	GeographicField50A	GeographicField50B	GeographicField51A	GeographicField51B	GeographicField52A	GeographicField52B	GeographicField53A	GeographicField53B	GeographicField54A	GeographicField54B	GeographicField55A	GeographicField55B	GeographicField56A	GeographicField56B	GeographicField57A	GeographicField57B	GeographicField58A	GeographicField58B	GeographicField59A	GeographicField59B	GeographicField60A	GeographicField60B	GeographicField61A	GeographicField61B	GeographicField62A	GeographicField62B	GeographicField63	GeographicField64
0	1	2013-08-16	B	23	0.9403	0.0006	965	1.0200	N	17	23	17	23	15	22	16	22	13	22	13	23	T	D	2	1	7	18	3	8	0	5	5	24	V	48649	0	0	0	...	8	4	20	22	10	8	6	5	15	13	19	18	16	14	21	23	21	23	16	11	22	24	7	14	-1	17	15	17	14	18	9	9	-1	8	-1	18	-1	10	N	CA
1	2	2014-04-22	F	7	1.0006	0.0040	548	1.2433	N	6	8	6	8	5	7	5	8	13	22	13	23	T	E	5	9	5	14	6	18	1	5	5	11	P	26778	0	1	1	...	23	24	11	15	21	24	6	11	21	21	18	15	20	20	13	12	12	12	15	9	13	11	11	20	-1	9	18	21	8	7	10	10	-1	11	-1	17	-1	20	N	NJ
2	4	2014-08-25	F	7	1.0006	0.0040	548	1.2433	N	7	12	7	12	6	10	7	11	25	25	13	23	T	J	4	6	3	10	4	11	1	5	5	11	K	8751	0	2	2	...	21	22	24	25	20	22	7	13	23	23	20	19	20	20	18	20	19	21	20	19	11	8	3	3	-1	5	21	24	12	15	15	18	-1	21	-1	11	-1	8	N	NJ
3	6	2013-04-15	J	10	0.9769	0.0004	1,165	1.2665	N	3	2	3	2	2	2	3	2	13	22	13	23	Y	F	15	23	8	19	14	24	0	5	5	23	V	43854	0	0	0	...	3	1	14	22	6	2	7	14	11	8	19	18	18	16	13	12	13	12	17	13	5	2	3	4	-1	7	14	14	14	18	6	5	-1	10	-1	9	-1	21	N	TX
4	8	2014-01-25	E	23	0.9472	0.0006	1,487	1.3045	N	8	13	8	13	7	11	7	13	13	22	13	23	T	F	4	6	3	6	3	6	1	5	5	7	R	12505	1	0	0	...	24	25	9	11	25	25	5	3	22	22	21	21	17	15	25	25	25	25	17	13	13	11	3	4	-1	7	11	9	10	10	18	22	-1	10	-1	11	-1	12	N	IL

	QuoteNumber	Original_Quote_Date	Field6	Field7	Field8	Field9	Field10	Field11	Field12	CoverageField1A	CoverageField1B	CoverageField2A	CoverageField2B	CoverageField3A	CoverageField3B	CoverageField4A	CoverageField4B	CoverageField5A	CoverageField5B	CoverageField6A	CoverageField6B	CoverageField8	CoverageField9	CoverageField11A	CoverageField11B	SalesField1A	SalesField1B	SalesField2A	SalesField2B	SalesField3	SalesField4	SalesField5	SalesField6	SalesField7	SalesField8	SalesField9	SalesField10	SalesField11	SalesField12	...	GeographicField44A	GeographicField44B	GeographicField45A	GeographicField45B	GeographicField46A	GeographicField46B	GeographicField47A	GeographicField47B	GeographicField48A	GeographicField48B	GeographicField49A	GeographicField49B	GeographicField50A	GeographicField50B	GeographicField51A	GeographicField51B	GeographicField52A	GeographicField52B	GeographicField53A	GeographicField53B	GeographicField54A	GeographicField54B	GeographicField55A	GeographicField55B	GeographicField56A	GeographicField56B	GeographicField57A	GeographicField57B	GeographicField58A	GeographicField58B	GeographicField59A	GeographicField59B	GeographicField60A	GeographicField60B	GeographicField61A	GeographicField61B	GeographicField62A	GeographicField62B	GeographicField63	GeographicField64
0	3	2014-08-12	E	16	0.9364	0.0006	1,487	1.3045	N	4	4	4	4	3	3	3	4	13	22	13	23	Y	K	13	22	6	16	9	21	0	5	5	11	P	67052	0	0	0	0	...	22	23	9	12	25	25	6	9	4	2	16	12	20	20	2	2	2	1	1	1	10	7	25	25	-1	19	19	22	12	15	1	1	-1	1	-1	20	-1	25	Y	IL
1	5	2013-09-07	F	11	0.9919	0.0038	564	1.1886	N	8	14	8	14	7	12	8	13	13	22	13	23	T	E	4	5	3	6	3	6	1	5	5	4	R	27288	1	0	0	0	...	23	24	12	21	23	25	7	11	16	14	13	6	17	15	7	5	7	5	13	7	14	14	7	14	-1	4	1	1	5	3	10	10	-1	5	-1	5	-1	21	N	NJ
2	7	2013-03-29	F	15	0.8945	0.0038	564	1.0670	N	11	18	11	18	10	16	10	18	13	22	13	23	T	E	3	3	5	14	3	9	1	5	5	23	V	65264	0	1	2	2	...	16	18	9	10	14	16	6	8	20	19	17	14	16	13	20	22	20	22	20	18	10	7	4	7	-1	11	13	12	18	22	10	11	-1	20	-1	22	-1	11	N	NJ
3	9	2015-03-21	K	21	0.8870	0.0004	1,113	1.2665	Y	14	22	15	22	13	20	22	25	13	22	13	23	Y	F	5	9	9	20	5	16	1	5	5	11	R	32725	1	1	1	1	...	11	11	9	10	11	13	15	21	14	12	17	13	10	6	20	22	20	22	19	16	12	11	4	6	-1	13	10	8	5	3	8	8	-1	13	-1	8	-1	21	N	TX
4	10	2014-12-10	B	25	0.9153	0.0007	935	1.0200	N	4	5	4	5	4	4	4	5	13	22	13	23	Y	D	12	21	1	1	3	6	0	5	5	11	T	56025	0	1	1	1	...	9	8	25	25	9	3	9	18	7	4	16	12	13	9	8	6	8	6	11	5	19	21	13	21	-1	23	11	8	5	3	7	7	-1	3	-1	22	-1	21	N	CA

	Field8	Field9	Field11	PersonalField84	PropertyField25	PropertyField29
QuoteNumber
1	0.9403	0.0006	1.0200	2.0	3.0	0.0
2	1.0006	0.0040	1.2433	NaN	2.0	NaN
4	1.0006	0.0040	1.2433	NaN	1.5	NaN
6	0.9769	0.0004	1.2665	NaN	1.0	NaN
8	0.9472	0.0006	1.3045	2.0	3.0	0.0