Satellite sea surface temperatures along the West Coast of the United States during the 2014–2016 northeast Pacific marine heat wave

In 2016 we published a paper on the heat wave in the ocean off the California coast

This analysis was the last time I used Matlab to process scientific data. To make Figure 1, here are the following steps:

  • Download 4 TB of data from NASA PO.DAAC data archive via FTP

  • Once downloaded, I was faced with files that were ~1 GB per day. So I wanted to pre-process the data to reduce the size.

  • I could do this with Fortran or Matlab. Matlab has a basic NetCDF reader, so I went with that.

  • Go through each day of data and subset to the West Coast Region to reduce size and save each subsetted day

  • Go through 2002-2012 and create a daily climatology and save all 365 days of the climatology

  • Go through each day of data and calculate the anomaly and save each day’s anomaly

  • Go through each day of data and calculate monthly and 5-day averages.

This whole process took about 1-2 month. Once the anomalies were calculated, then I could start to do analyses and explore the data. Below we will do this using MUR SST data on AWS Open Data Program in a few minutes using Python.

import warnings

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import fsspec
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

warnings.simplefilter("ignore")  # filter some warning messages
xr.set_options(display_style="html")  # display dataset nicely

Set the space/time parameters

  • Region for subsetting spatially and temporally (for climatology)

  • 5 points for timeseries

xlat1, xlat2 = 33, 48
xlon1, xlon2 = -132, -118
date1, date2 = "2002-01-01", "2012-12-31"

blanco = {"name": "Cape Blanco", "lat": 42.837, "lon": -124.563}
mendo = {"name": "Cape Mendocino", "lat": 40.44, "lon": -124.408}
newport = {"name": "Newport", "lat": 45, "lon": -124.061}
# newport={'name':'Newport','lat':44.634,'lon':-124.061}
mont = {"name": "Monterey", "lat": 36.598, "lon": -121.8922}
sbarb = {"name": "Santa Barbara", "lat": 34.417, "lon": -119.700}

Amazon Open Data Program MUR SST

NASA JPL MUR Level 4 SST dataset in Zarr format.
There are two version of this data:\

  • The zarr-v1/ directory contains a zarr store chunked (6443, 100, 100) along the dimensions (time, lat, lon).

  • The zarr/ directory contains a zarr store chunked (5,1799,3600) along the dimensions (time, lat, lon).

What is chunking and why does it matter? Read this.

file_aws = "https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1"
file_aws_time = "https://mur-sst.s3.us-west-2.amazonaws.com/zarr"

Read MUR SST chunked for space/time

%%time
ds_sst = xr.open_zarr(file_aws, consolidated=True)

ds_sst

Read MUR SST chunked for time series analysis

%%time
ds_sst_time = xr.open_zarr(file_aws_time, consolidated=True)

ds_sst_time

Subset all the data to just the West Coast region

  • options are indices, .isel, or .sel

subset =

Just plot a random day to make sure it looks correct

  • options are .plot or plt.imshow or plt.pcolormesh

Make the map look nice & use cartopy features

# code features from https://scitools.org.uk/cartopy/docs/v0.14/matplotlib/feature_interface.html
crs = ccrs.PlateCarree() #set projection

# Plot on a map
ax = plt.subplot(projection=crs)

subset.  #add something here, what to plot? axis? transform? 

ax.coastlines("10m", color="k")

ax.add_feature(cfeature.LAND, color='grey')

ax.add_feature(cfeature.STATES.with_scale('10m'))

How big is this dataset?

  • Because xarray uses lazy loading, we have access to this entire dataset but it only loads what it needs to for calculations

size = 
print('GB data = ',size)

Calculate the Monthly Sea Surface Temperature Anomalies

  • use .resample to calculate monthly average (downsample)

  • use .groupby to calculate climatology

  • calculate anomaly

#resample the data to monthly
sst_monthly = subset.

#only use part of the dataset for the climatology, then groupby and mean
climatology_mean_monthly = sst_monthly.

#calculate anomaly
sst_anomaly_monthly = sst_monthly.

sst_anomaly_monthly
# plot 2003-03-01 using cmap='RdYlBu_r'
sst_anomaly_monthly.
# plot 2015-03-01 using cmap='RdYlBu_r'
sst_anomaly_monthly.

Make plots again, this time using cartopy land mask, coastlines, states, to look nice, use subplot a control your axes so that they appear in 1 figure

Figure 3

  • Switch to same data, but it is chunked differently

  • it is optimized for timeseries rather than spatial analysis

# change to deg C

ds_sst_time["analysed_sst"] = ds_sst_time["analysed_sst"] - 273.15  # transform units from Kelvin to  Celsius

ds_sst_time["analysed_sst"].attrs["units"] = "deg C"  # update units in metadata
%%time
#select newport lat/lon
# using .rolling to make 30 day smoothing

sst_newport_coast = ds_sst_time.analysed_sst.

sst_newport_offshore = ds_sst_time.analysed_sst.
data = sst_newport_offshore
ystr= 'SST$_{offshore}$ ($^\circ$C)'

# select 2002 to 2013
# use groupby to find max/min/mean/std for day of year

maxval = data.
minval = data.
meanval = data.
stdval = data.
# make figure 3 panel

Data exploration

  • The data you have is organized by time, latitude, and longitude.

  • You can look at the daily values, their anomalies, or means in time/space

  • Now, imagine you hear about a Blob of warm water from colleagues, you look at a website to see a quickview

  • Look here

  • There are a lot of different ways to look at data

  • How would you explore the evolution of SSTs along the West Coast during this Marine Heat Wave (MHW)?

  • Spatial maps (like figure 1) ?

  • Timeseries at points (like figure 3) ?

  • Longitude versus Time (Hovmuller Plots) at a certain latitude ?

  • Does your audience matter? Why?

  • Explore the data below and explain why you choose to explore it that way.