Air Sensor Dataset – FAQ

The Air Sensor Dataset (ASDS) currently contains 7 years of quality-controlled fine particulate matter (PM2.5) data from 2018 to the end of 2024 for over 5,000 outdoor PurpleAir (PA) sensors within the nine counties covered by the Bay Air Quality Management District, which include the greater San Francisco-Oakland and San Jose areas. The PM2.5 data has been quality controlled and corrected (see “What is EPA-corrected vs uncorrected PM2.5 data?” below). More details on data processing and quality control are described in Table 1. Both the processed hourly and daily (24-hr) PM2.5 data are available, along with metadata (supporting site information) for each site. Table 2 and Table 3 describe the file contents for the hourly data and daily data, respectively. Table 4 describes the file contents for the metadata.

This dataset can be used for hyperlocal, community, and regional scale analyses, such as identifying monitoring gaps in your community, or comparing annual PM2.5 across census tracts, counties or cities, and more. Example data analyses are also shown on the Bay Air Center Air Sensor Dataset Resource page. Regardless of application, all data users will need to abide by the terms for data use described below:

  • Data cannot be directly sold or redistributed for profit.
  • Acknowledgment should be given to the Bay Air Center and PurpleAir in derived analyses, reports, products, etc. Follow attribution guide provided by PurpleAir.
  • Data are not fully verified or validated; these data are subject to change and data is provide as is and follows the terms and conditions in PurpleAir’s terms of service.
  • Data values should not be altered in any way.
  • There may be differences in these air sensor data compared to other air quality data types (e.g., reference instruments). These differences can occur due to where a device is located, its operation and maintenance, local emissions sources, and more.

The ASDS includes both EPA-corrected and uncorrected PM2.5 data.

  • EPA-corrected – PM2.5 data that has been corrected using the U.S.-Wide Correction for PurpleAir PM2.5 Sensors developed by the EPA to correct PA data for a known high concentration bias. You should use the EPA-corrected values for all applications. More information on this correction can be found below, under EPA Corrected PM2.5 Supporting Information.
  • Uncorrected – raw Purple Air data that has not had any corrections applied. The uncorrected data is solely intended for use in specific research applications, such as evaluating correction algorithms.

EPA CORRECTED PM2.5 SUPPORTING INFORMATION

The U.S.-Wide Correction for PurpleAir PM2.5 Sensors is a simple model correction developed by the EPA based on PurpleAir field evaluation against FEM and near FEM PM2.5 monitors and specifically sensor performance during smoke-impacted events between 2018-2020. The correction takes into account the influence of relative humidity on sensor measurements by combining raw PM2.5 data with relative humidity in the model equations. The specific correction that is applied to PA PM2.5 data is also dependent on PM2.5 concentration thresholds as well as the PA channel data source (PA has two separate channels). More information on the U.S.-Wide Correction for PurpleAir PM2.5 Sensors can be found in EPA’s AirNow Fire and Smoke Map presentation, as well as their correction publication. The specific corrections used within Air Sensor Dataset are shown below, where “cf=1” denotes the PA channel data source:

a.   For Low Concentration PM2.5 (cf=1) ≤ 343 µg/m3:
PM2.5 (corr) = 0.52 x PM2.5 (AB avg) – 0.086 x RH + 5.75

b.   For High Concentration PM2.5 (cf=1) > 343 µg/m3:
PM2.5 (corr) = 0.46 x PM2.5 (AB avg) + 3.93 x 10-4 x PM2.5 (AB avg) 2 + 2.97

Stage of Data Quality Control
Raw
  • Manually audited the program and independently verified calculations
  • Flagged data beyond operating specifications (min/max check)
  • Enforced EPA 75% hourly data completeness
  • Compared A & B channels based on AirNow Fire and Smoke Map procedure
  • Removed indoor sensors based on site naming conventions and PA model type
Hourly

Automatic checks:

  • Removed the first 24 hours of data from each Site ID
  • Checked for negative values of – 5 µg/m³ and below based on AirNow QC rule
  • Identified potential indoor sensors based on anomalous PM2.5 (EPA corrected) and temperature QA check within the same general area
Hourly

Manual checks:

  • Reviewed potential indoor sensors identified through automatic code based on their daily PM2.5 and monthly temperature data and identified Site IDs that had the highest likelihood of operating indoors
  • Reviewed data by site proximity
  • Reviewed monthly/weekly averages of data for problematic sensors
  • Removed or flagged data where appropriate (indoor, problematic, imposter (optional) )
  • Flagged indoor, problematic, and imposter sites in Metadata file
Daily

Automatic checks:

  • Enforced 75% data completeness
  • Removed identified indoor, problematic, or imposter sensors

Field Units Format Description
Site_ID N/A nnnnnn Numeric site identifier
Site_Name N/A String Site name (text)
Datetime PST1 yyyy-mm-dd HH:MM Date and time
Date PST1 yyyy-mm-dd Date field
Time PST1 HH:MM Time in hours and minutes
Jday PST1 ddd Julian day
Month PST1 mm Month number (1-12)
DOW PST1 Sun-Sat Day of week
Hour PST1 HH Hour of the day (00-23)
Latitude PST1 nn.nnnnn Site position, latitude
Longitude PST1 nnn.nnnnn Site position, longitude
Elevation Feet nnnn Site elevation above sea level
County N/A String County name where sensor is located
PM2.5_EPA µg/m³ nnn.n PM2.5 concentration (EPA corrected)
PM2.5_Uncorr µg/m³ nnn.n PM2.5 concentration (uncorrected)
Temp F nnn.n Air sensor temperature; not ambient temperature
RH % nnn.n Air sensor relative humidity; not ambient relative humidity
QC_Flags 0-Valid
8-Invalid
n Quality control flag for entire record
QC_Descriptor ‘Incomplete’, ‘ABcheck’ ‘Negative’, ‘Sticking’, ‘Indoor’, ‘Degraded’, etc. String Quality control descriptor for entire record

1begin time convention

Field Units Format Comments
Site_ID N/A nnnnnn Numeric site identifier
Site_Name N/A String Text with the site name
Datetime PST1 yyyy-mm-dd HH:MM Datetime field
Date PST1 yyyy-mm-dd Date field
Jday PST1 ddd Julian day
Month PST1 mm Month number (1-12)
DOW PST1 Sun-Sat Day of week
Day PST1 dd Day (1-31)
Latitude Decimal degrees nn.nnnnn Site position, latitude
Longitude Decimal degrees nnn.nnnnn Site position, longitude
Elevation Feet nnnn Site elevation above sea level
County N/A String County name where sensor is located
PM2.5_EPA µg/m³ nnn.n PM2.5 concentration (EPA corrected)
PM2.5_Uncorr µg/m³ nnn.n PM2.5 concentration (uncorrected)
Temp F nnn.n Air sensor temperature; not ambient temperature
RH % nnn.n Air sensor relative humidity; not ambient relative humidity
QC_Flags 0-Valid
8-Invalid
n Quality control flag for entire record
QC_Descriptor ‘Incomplete’ String Quality control descriptor for entire record

1 begin time convention

Field Units Format Description
Site_ID N/A nnnnnn Numeric site identifier
Date_Created PST yyyy-mm-dd HH:MM Datetime when SiteID created
Site_Name N/A String Text with the site name
Model N/A String PurpleAir sensor model
Hardware N/A String PurpleAir sensor hardware
Firmware_Version N/A String PurpleAir sensor firmware version
Firmware_Update N/A String PurpleAir sensor updated firmware version
Latitude Decimal degrees nn.nnnnn Site position, latitude
Longitude Decimal degrees nnn.nnnnn Site position, longitude
Elevation Feet nnnn Site elevation above sea level (ft)
County N/A String District name where sensor is located
Geohash_4 N/A String Geographic region where sensor is located defined by geohash string of length 4
(tile: 1km x 12.5km)
Geohash_5 N/A String Geographic region where sensor is located defined by geohash string of length 5 (4.9km x 4.9km)
Geohash_6 N/A String Geographic region where sensor is located defined by geohash string of length 6 (1.2km x 609.4m)
Geohash_7 N/A String Geographic region where sensor is located defined by geohash string of length 7 (152.9km x 152.4m)
Imposter_Sensors TRUE or FALSE Boolean Identifies imposter PA sensors (new/unknown shareware circuitry)
Indoor_Sensors TRUE or FALSE Boolean Identifies indoor PA sensors which are not included within verbose or final files
Problematic_Sensors TRUE or FALSE Boolean Identifies problematic PA sensors which are not included within verbose or final files

Stay Informed

Subscribe
Skip to content