Satellite sea surface temperatures along the West Coast of the United States during the 2014–2016 northeast Pacific marine heat wave
Contents
Satellite sea surface temperatures along the West Coast of the United States during the 2014–2016 northeast Pacific marine heat wave¶
In 2016 we published a paper on the heat wave in the ocean off the California coast
This analysis was the last time I used Matlab to process scientific data. To make Figure 1, here are the following steps:
Download 4 TB of data from NASA PO.DAAC data archive via FTP
Once downloaded, I was faced with files that were ~1 GB per day. So I wanted to pre-process the data to reduce the size.
I could do this with Fortran or Matlab. Matlab has a basic NetCDF reader, so I went with that.
Go through each day of data and subset to the West Coast Region to reduce size and save each subsetted day
Go through 2002-2012 and create a daily climatology and save all 365 days of the climatology
Go through each day of data and calculate the anomaly and save each day’s anomaly
Go through each day of data and calculate monthly and 5-day averages.
This whole process took about 1-2 month. Once the anomalies were calculated, then I could start to do analyses and explore the data. Below we will do this using MUR SST data on AWS Open Data Program in a few minutes using Python.
import warnings
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import fsspec
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
warnings.simplefilter("ignore") # filter some warning messages
xr.set_options(display_style="html") # display dataset nicely
Set the space/time parameters¶
Region for subsetting spatially and temporally (for climatology)
5 points for timeseries
xlat1, xlat2 = 33, 48
xlon1, xlon2 = -132, -118
date1, date2 = "2002-01-01", "2012-12-31"
blanco = {"name": "Cape Blanco", "lat": 42.837, "lon": -124.563}
mendo = {"name": "Cape Mendocino", "lat": 40.44, "lon": -124.408}
newport = {"name": "Newport", "lat": 45, "lon": -124.061}
# newport={'name':'Newport','lat':44.634,'lon':-124.061}
mont = {"name": "Monterey", "lat": 36.598, "lon": -121.8922}
sbarb = {"name": "Santa Barbara", "lat": 34.417, "lon": -119.700}
Amazon Open Data Program MUR SST¶
NASA JPL MUR Level 4 SST dataset in Zarr format.
There are two version of this data:\
The zarr-v1/ directory contains a zarr store chunked (6443, 100, 100) along the dimensions (time, lat, lon).
The zarr/ directory contains a zarr store chunked (5,1799,3600) along the dimensions (time, lat, lon).
What is chunking and why does it matter? Read this.
file_aws = "https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1"
file_aws_time = "https://mur-sst.s3.us-west-2.amazonaws.com/zarr"
Read MUR SST chunked for space/time¶
%%time
ds_sst = xr.open_zarr(file_aws, consolidated=True)
ds_sst
Read MUR SST chunked for time series analysis¶
%%time
ds_sst_time = xr.open_zarr(file_aws_time, consolidated=True)
ds_sst_time
Just plot a random day to make sure it looks correct¶
options are .plot or plt.imshow or plt.pcolormesh
Make the map look nice & use cartopy features¶
# code features from https://scitools.org.uk/cartopy/docs/v0.14/matplotlib/feature_interface.html
crs = ccrs.PlateCarree() #set projection
# Plot on a map
ax = plt.subplot(projection=crs)
subset. #add something here, what to plot? axis? transform?
ax.coastlines("10m", color="k")
ax.add_feature(cfeature.LAND, color='grey')
ax.add_feature(cfeature.STATES.with_scale('10m'))
How big is this dataset?¶
Because xarray uses lazy loading, we have access to this entire dataset but it only loads what it needs to for calculations
size =
print('GB data = ',size)
Calculate the Monthly Sea Surface Temperature Anomalies¶
use .resample to calculate monthly average (downsample)
use .groupby to calculate climatology
calculate anomaly
#resample the data to monthly
sst_monthly = subset.
#only use part of the dataset for the climatology, then groupby and mean
climatology_mean_monthly = sst_monthly.
#calculate anomaly
sst_anomaly_monthly = sst_monthly.
sst_anomaly_monthly
# plot 2003-03-01 using cmap='RdYlBu_r'
sst_anomaly_monthly.
# plot 2015-03-01 using cmap='RdYlBu_r'
sst_anomaly_monthly.
Make plots again, this time using cartopy land mask, coastlines, states, to look nice, use subplot a control your axes so that they appear in 1 figure¶
Figure 3¶
Switch to same data, but it is chunked differently
it is optimized for timeseries rather than spatial analysis
# change to deg C
ds_sst_time["analysed_sst"] = ds_sst_time["analysed_sst"] - 273.15 # transform units from Kelvin to Celsius
ds_sst_time["analysed_sst"].attrs["units"] = "deg C" # update units in metadata
%%time
#select newport lat/lon
# using .rolling to make 30 day smoothing
sst_newport_coast = ds_sst_time.analysed_sst.
sst_newport_offshore = ds_sst_time.analysed_sst.
data = sst_newport_offshore
ystr= 'SST$_{offshore}$ ($^\circ$C)'
# select 2002 to 2013
# use groupby to find max/min/mean/std for day of year
maxval = data.
minval = data.
meanval = data.
stdval = data.
# make figure 3 panel
Data exploration¶
The data you have is organized by time, latitude, and longitude.
You can look at the daily values, their anomalies, or means in time/space
Now, imagine you hear about a Blob of warm water from colleagues, you look at a website to see a quickview
Look here
There are a lot of different ways to look at data
How would you explore the evolution of SSTs along the West Coast during this Marine Heat Wave (MHW)?
Spatial maps (like figure 1) ?
Timeseries at points (like figure 3) ?
Longitude versus Time (Hovmuller Plots) at a certain latitude ?
Does your audience matter? Why?
Explore the data below and explain why you choose to explore it that way.