Los Angeles Exploration & Visualization

The San Francisco Bay Area was of interest since the wildfires directly affected us in living in the area. The 2020 wildfires, however, largely affected California as a whole, so we also wanted explore the changes and trends in air quality in areas outside of San Francisco. Specifically, we looked into the Los Angeles area since it historically has had the worst air quality in all of California.

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import openaq
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib as mpl
sns.set_theme(style="darkgrid")

import numpy as np
from shapely.geometry import Point

import geopandas as gpd
from geopandas import GeoDataFrame

import geoplot as gplt
import geoplot.crs as gcrs

import fiona

import imageio
import os

import IPython
from IPython.core.display import Image
api = openaq.OpenAQ()
resp = api.cities(df=True, limit=10000)
la = api.locations(city='Los Angeles-Long Beach-Santa Ana', df=True)
la.head()
id country city cities location locations sourceName sourceNames sourceType sourceTypes firstUpdated lastUpdated parameters countsByMeasurement count coordinates.latitude coordinates.longitude
0 1052 US Los Angeles-Long Beach-Santa Ana [Los Angeles-Long Beach-Santa Ana] Pomona [Pomona] AirNow [AirNow] government [government] 2016-03-06 19:00:00+00:00 2022-05-11 18:00:00+00:00 [co, no2, o3] [{'parameter': 'co', 'count': 96231}, {'parame... 289411 34.0667 -117.7506
1 1200 US Los Angeles-Long Beach-Santa Ana [Los Angeles-Long Beach-Santa Ana] Glendora - Laurel [Glendora - Laurel] AirNow [AirNow] government [government] 2016-03-06 19:00:00+00:00 2022-05-11 18:00:00+00:00 [pm25, no2, pm10, o3, co] [{'parameter': 'pm25', 'count': 97957}, {'para... 484557 34.1439 -117.8508
2 1948 US Los Angeles-Long Beach-Santa Ana [Los Angeles-Long Beach-Santa Ana] Compton [Compton] AirNow [AirNow] government [government] 2016-03-06 19:00:00+00:00 2022-05-11 18:00:00+00:00 [no2, pm25, o3, co] [{'parameter': 'no2', 'count': 94042}, {'param... 329063 33.9014 -118.2050
3 1989 US Los Angeles-Long Beach-Santa Ana [Los Angeles-Long Beach-Santa Ana] Santa Clarita [Santa Clarita] AirNow [AirNow] government [government] 2016-03-06 19:00:00+00:00 2022-05-11 18:00:00+00:00 [co, no2, o3, pm25] [{'parameter': 'co', 'count': 95956}, {'parame... 388830 34.3833 -118.5283
4 2138 US Los Angeles-Long Beach-Santa Ana [Los Angeles-Long Beach-Santa Ana] Reseda [Reseda] AirNow [AirNow] government [government] 2016-03-06 19:00:00+00:00 2022-05-11 18:00:00+00:00 [no2, co, o3, pm25] [{'parameter': 'no2', 'count': 92881}, {'param... 386486 34.1992 -118.5331

Mapping Cities in the Los Angeles Area

After querying into the OpenAQ dataset, we map the points in the Los Angeles-Long Beach-Santa Ana area that are included in the queried dataframe.

geometry = [Point(xy) for xy in zip(la['coordinates.longitude'], la['coordinates.latitude'])]
gdf = GeoDataFrame(la, geometry=geometry)
world = gpd.read_file("https://public.gis.lacounty.gov/public/rest/services/LACounty_Dynamic/Political_Boundaries/MapServer/18/query?outFields=*&where=1%3D1&f=geojson")
gdf.plot(ax=world.plot(figsize=(15, 15)), marker='o', color='red', markersize=15)
<AxesSubplot:>
_images/LA_Visualizations_7_1.png
la_measure = api.measurements(city='Los Angeles-Long Beach-Santa Ana', parameter='pm25', limit=1000, df=True)

# Print out the statistics on a per-location basiss
la_measure.groupby(['location'])['value'].describe()
count mean std min 25% 50% 75% max
location
Anaheim 99.0 7.894949 3.088933 -0.4 6.20 8.0 9.900 15.9
Compton 99.0 11.219192 3.350714 2.7 9.10 11.3 13.250 19.0
Glendora - Laurel 99.0 6.520202 3.980821 0.0 3.35 6.4 8.950 18.6
Lancaster-Division 114.0 7.385965 3.480690 2.0 5.00 7.0 9.000 22.0
Long Beach Signal Hi 99.0 9.106061 2.635156 3.5 7.50 8.8 10.800 17.7
Los Angeles - N. Mai 96.0 11.502083 3.080122 3.9 9.70 11.4 13.575 18.5
Mission Viejo 99.0 13.965657 4.216855 6.9 10.65 13.7 17.100 23.4
North Holywood 99.0 11.792929 3.454404 3.1 9.90 11.7 13.500 21.0
Reseda 97.0 10.483505 4.561448 2.2 7.80 9.8 12.400 25.5
Santa Clarita 99.0 9.580808 3.474210 3.4 7.20 9.1 11.700 17.8
fig, ax = plt.subplots(1, figsize=(14,7))

ax = sns.boxplot(
    x='location',
    y='value',
    data=la_measure.query("value >= 0.0"),
    fliersize=0,
    palette='deep',
    ax=ax)

ax.set_ylim([0, 30])
ax.set_ylabel("$PM_{2.5}\;[\mu gm^{-3}]$", fontsize=18)
ax.set_xlabel("")

sns.despine(offset=10)

plt.xticks(rotation=90)
plt.show()
_images/LA_Visualizations_9_0.png

Comparing Air Quality for Dark Days vs. Average Days

We then compare the air quality in Los Angeles around the dark days and on average days (May 3th to 10th 2022), and compare this difference in air quality in LA with the difference in air quality in SF.

fig, ax = plt.subplots(1, figsize=(10, 6))

for group, df in la_measure.groupby('location'):
    # Query the data to only get positive values and resample to hourly
    _df = df.query("value >= 0.0").resample('1h').mean()

    _df.value.plot(ax=ax, label=group)

ax.legend(loc='best')
ax.set_ylabel("$PM_{2.5}$  [$\mu g m^{-3}$]", fontsize=20)
ax.set_xlabel("")
sns.despine(offset=5)

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.suptitle("$PM_{2.5}$ Measurements in LA Area (05/03/2022 - 05/10/2022)", y=0.92)

plt.show()
_images/LA_Visualizations_12_0.png
dark_days_LA = api.measurements(city='Los Angeles-Long Beach-Santa Ana', param="pm25", date_from='2020-09-07', date_to='2020-09-17', limit=10000, df=True)
fig= plt.figure(figsize=(20, 10))
ax1 = plt.subplot(121)
ax2 = plt.subplot(122, sharey=ax1)
for group, df in la_measure.groupby('location'):
    # Query the data to only get positive values and resample to hourly
    _df = df.query("value >= 0.0").resample('1h').mean()
    _df.value.plot(ax=ax2, label=group)
for group, df in dark_days_LA.groupby('location'):
    # Query the data to only get positive values and resample to hourly
    _df = df.query("value >= 0.0").resample('1h').mean()
    _df.value.plot(ax=ax1, label=group)
    
plt.legend(loc='best')
ax1.set_ylabel("$PM_{2.5}$  [$\mu g m^{-3}$]", fontsize=20)
ax2.set_ylabel("$PM_{2.5}$  [$\mu g m^{-3}$]", fontsize=20)
ax1.set_xlabel("")
ax2.set_xlabel("")
sns.despine(offset=5)
ax1.set_title('09/07/2020 to 09/17/2020')
ax2.set_title('05/03/2022 to 05/10/2022')

plt.suptitle("$PM_{2.5}$ Measurements in LA Area", y=0.98)
plt.show()
_images/LA_Visualizations_14_0.png

As seen in these two compared visualizations, we can see that compared to a normal day in LA exhibited through the May 2022 air quality datasets, which averages a PM2.5 value of approximately 15, the air quality in the dark days were significantly higher.

Image(filename = 'figures/PM25TScompare.png', width=1000, height=1000)
_images/LA_Visualizations_16_0.png

Upon comparing the two LA visualizations with the SF visualizations, we can see that although on a normal day, the air quality in Los Angeles is worse than San Francisco, the air quality in SF around the “dark day” was worse than that in LA. This makes sense, since the air quality in San Francisco was visibly poor with orange skies, whereas there were no strong visible indications of the poor air quality in Los Angeles around that time.