Perform exploratory data analysis to guide model tuning
Contents
Perform exploratory data analysis to guide model tuning¶
Visit this link for more information about the fire index features: https://www.nwcg.gov/publications/pms437/cffdrs/fire-weather-index-system
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
# Prep plotting aesthetics
# Set font size names
SMALL_SIZE = 14
MEDIUM_SIZE = 16
BIGGER_SIZE = 22
# Set font sizes
plt.rc('font', size=SMALL_SIZE) # controls default text sizes
plt.rc('axes', titlesize=BIGGER_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=BIGGER_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE) # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE) # fontsize of the figure title
# Set figure size
plt.rcParams["figure.figsize"] = (14, 8) # size of the figure plotted
Load and format data¶
# Load in cleaned data
DATA = pd.read_csv("data/Algerian_forest_fires_dataset_CLEANED.csv")
# Drop extra index
DATA.drop('Unnamed: 0', axis = 1, inplace = True)
# Add datetime column based on day, month, year
DATA['Datetime'] = pd.to_datetime(DATA[['year', 'month', 'day']])
DATA.head()
day | month | year | Temperature | RH | Ws | Rain | FFMC | DMC | DC | ISI | BUI | FWI | Classes | Region | Datetime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 6 | 2012 | 29 | 57 | 18 | 0.0 | 65.7 | 3.4 | 7.6 | 1.3 | 3.4 | 0.5 | notfire | Bejaia | 2012-06-01 |
1 | 2 | 6 | 2012 | 29 | 61 | 13 | 1.3 | 64.4 | 4.1 | 7.6 | 1.0 | 3.9 | 0.4 | notfire | Bejaia | 2012-06-02 |
2 | 3 | 6 | 2012 | 26 | 82 | 22 | 13.1 | 47.1 | 2.5 | 7.1 | 0.3 | 2.7 | 0.1 | notfire | Bejaia | 2012-06-03 |
3 | 4 | 6 | 2012 | 25 | 89 | 13 | 2.5 | 28.6 | 1.3 | 6.9 | 0.0 | 1.7 | 0.0 | notfire | Bejaia | 2012-06-04 |
4 | 5 | 6 | 2012 | 27 | 77 | 16 | 0.0 | 64.8 | 3.0 | 14.2 | 1.2 | 3.9 | 0.5 | notfire | Bejaia | 2012-06-05 |
Plot scatterplots and examine correlations¶
# Scatter plot for all of the covariates except the date/time
%time
sns.pairplot(DATA[['Temperature','RH','Ws','Rain','FFMC','DMC','DC','ISI','BUI','FWI']])
plt.savefig('figures/figure_1.png')
CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 6.91 µs

From the pairplot we can examine outliers, but the only significant outliers we find are from the Rain variable, and we think theese should be included in the data as two days of heavy rain seems naturally occuring to us (rather than measurement error) and also useful information in understanding factors that lead to fire.
# Examine numerical correlations between all variables
DATA[['Temperature','RH','Ws','Rain','FFMC','DMC','DC','ISI','BUI','FWI','month']].corr()
Temperature | RH | Ws | Rain | FFMC | DMC | DC | ISI | BUI | FWI | month | |
---|---|---|---|---|---|---|---|---|---|---|---|
Temperature | 1.000000 | -0.651400 | -0.284510 | -0.326492 | 0.676568 | 0.485687 | 0.376284 | 0.603871 | 0.459789 | 0.566670 | -0.056781 |
RH | -0.651400 | 1.000000 | 0.244048 | 0.222356 | -0.644873 | -0.408519 | -0.226941 | -0.686667 | -0.353841 | -0.580957 | -0.041252 |
Ws | -0.284510 | 0.244048 | 1.000000 | 0.171506 | -0.166548 | -0.000721 | 0.079135 | 0.008532 | 0.031438 | 0.032368 | -0.039880 |
Rain | -0.326492 | 0.222356 | 0.171506 | 1.000000 | -0.543906 | -0.288773 | -0.298023 | -0.347484 | -0.299852 | -0.324422 | 0.034822 |
FFMC | 0.676568 | -0.644873 | -0.166548 | -0.543906 | 1.000000 | 0.603608 | 0.507397 | 0.740007 | 0.592011 | 0.691132 | 0.017030 |
DMC | 0.485687 | -0.408519 | -0.000721 | -0.288773 | 0.603608 | 1.000000 | 0.875925 | 0.680454 | 0.982248 | 0.875864 | 0.067943 |
DC | 0.376284 | -0.226941 | 0.079135 | -0.298023 | 0.507397 | 0.875925 | 1.000000 | 0.508643 | 0.941988 | 0.739521 | 0.126511 |
ISI | 0.603871 | -0.686667 | 0.008532 | -0.347484 | 0.740007 | 0.680454 | 0.508643 | 1.000000 | 0.644093 | 0.922895 | 0.065608 |
BUI | 0.459789 | -0.353841 | 0.031438 | -0.299852 | 0.592011 | 0.982248 | 0.941988 | 0.644093 | 1.000000 | 0.857973 | 0.085073 |
FWI | 0.566670 | -0.580957 | 0.032368 | -0.324422 | 0.691132 | 0.875864 | 0.739521 | 0.922895 | 0.857973 | 1.000000 | 0.082639 |
month | -0.056781 | -0.041252 | -0.039880 | 0.034822 | 0.017030 | 0.067943 | 0.126511 | 0.065608 | 0.085073 | 0.082639 | 1.000000 |