The Air Sensor Dataset (ASDS) currently contains 7 years of quality-controlled fine particulate matter (PM2.5) data from 2018 to the end of 2024 for over 5,000 outdoor PurpleAir (PA) sensors within the nine counties covered by the Bay Air Quality Management District, which include the greater San Francisco-Oakland and San Jose areas. The PM2.5 data has been quality controlled and corrected (see “What is EPA-corrected vs uncorrected PM2.5 data?” below). More details on data processing and quality control are described in Table 1. Both the processed hourly and daily (24-hr) PM2.5 data are available, along with metadata (supporting site information) for each site. Table 2 and Table 3 describe the file contents for the hourly data and daily data, respectively. Table 4 describes the file contents for the metadata.
This dataset can be used for hyperlocal, community, and regional scale analyses, such as identifying monitoring gaps in your community, or comparing annual PM2.5 across census tracts, counties or cities, and more. Example data analyses are also shown on the Bay Air Center Air Sensor Dataset Resource page. Regardless of application, all data users will need to abide by the terms for data use described below:
- Data cannot be directly sold or redistributed for profit.
- Acknowledgment should be given to the Bay Air Center and PurpleAir in derived analyses, reports, products, etc. Follow attribution guide provided by PurpleAir.
- Data are not fully verified or validated; these data are subject to change and data is provide as is and follows the terms and conditions in PurpleAir’s terms of service.
- Data values should not be altered in any way.
- There may be differences in these air sensor data compared to other air quality data types (e.g., reference instruments). These differences can occur due to where a device is located, its operation and maintenance, local emissions sources, and more.
The ASDS includes both EPA-corrected and uncorrected PM2.5 data.
- EPA-corrected – PM2.5 data that has been corrected using the U.S.-Wide Correction for PurpleAir PM2.5 Sensors developed by the EPA to correct PA data for a known high concentration bias. You should use the EPA-corrected values for all applications. More information on this correction can be found below, under EPA Corrected PM2.5 Supporting Information.
- Uncorrected – raw Purple Air data that has not had any corrections applied. The uncorrected data is solely intended for use in specific research applications, such as evaluating correction algorithms.
EPA CORRECTED PM2.5 SUPPORTING INFORMATION
The U.S.-Wide Correction for PurpleAir PM2.5 Sensors is a simple model correction developed by the EPA based on PurpleAir field evaluation against FEM and near FEM PM2.5 monitors and specifically sensor performance during smoke-impacted events between 2018-2020. The correction takes into account the influence of relative humidity on sensor measurements by combining raw PM2.5 data with relative humidity in the model equations. The specific correction that is applied to PA PM2.5 data is also dependent on PM2.5 concentration thresholds as well as the PA channel data source (PA has two separate channels). More information on the U.S.-Wide Correction for PurpleAir PM2.5 Sensors can be found in EPA’s AirNow Fire and Smoke Map presentation, as well as their correction publication. The specific corrections used within Air Sensor Dataset are shown below, where “cf=1” denotes the PA channel data source:
a. For Low Concentration PM2.5 (cf=1) ≤ 343 µg/m3:
PM2.5 (corr) = 0.52 x PM2.5 (AB avg) – 0.086 x RH + 5.75
b. For High Concentration PM2.5 (cf=1) > 343 µg/m3:
PM2.5 (corr) = 0.46 x PM2.5 (AB avg) + 3.93 x 10-4 x PM2.5 (AB avg) 2 + 2.97
| Stage of Data | Quality Control |
|---|---|
| Raw |
|
| Hourly |
Automatic checks:
|
| Hourly |
Manual checks:
|
| Daily |
Automatic checks:
|
| Field | Units | Format | Description |
|---|---|---|---|
| Site_ID | N/A | nnnnnn | Numeric site identifier |
| Site_Name | N/A | String | Site name (text) |
| Datetime | PST1 | yyyy-mm-dd HH:MM | Date and time |
| Date | PST1 | yyyy-mm-dd | Date field |
| Time | PST1 | HH:MM | Time in hours and minutes |
| Jday | PST1 | ddd | Julian day |
| Month | PST1 | mm | Month number (1-12) |
| DOW | PST1 | Sun-Sat | Day of week |
| Hour | PST1 | HH | Hour of the day (00-23) |
| Latitude | PST1 | nn.nnnnn | Site position, latitude |
| Longitude | PST1 | nnn.nnnnn | Site position, longitude |
| Elevation | Feet | nnnn | Site elevation above sea level |
| County | N/A | String | County name where sensor is located |
| PM2.5_EPA | µg/m³ | nnn.n | PM2.5 concentration (EPA corrected) |
| PM2.5_Uncorr | µg/m³ | nnn.n | PM2.5 concentration (uncorrected) |
| Temp | F | nnn.n | Air sensor temperature; not ambient temperature |
| RH | % | nnn.n | Air sensor relative humidity; not ambient relative humidity |
| QC_Flags | 0-Valid 8-Invalid |
n | Quality control flag for entire record |
| QC_Descriptor | ‘Incomplete’, ‘ABcheck’ ‘Negative’, ‘Sticking’, ‘Indoor’, ‘Degraded’, etc. | String | Quality control descriptor for entire record |
1begin time convention
| Field | Units | Format | Comments |
|---|---|---|---|
| Site_ID | N/A | nnnnnn | Numeric site identifier |
| Site_Name | N/A | String | Text with the site name |
| Datetime | PST1 | yyyy-mm-dd HH:MM | Datetime field |
| Date | PST1 | yyyy-mm-dd | Date field |
| Jday | PST1 | ddd | Julian day |
| Month | PST1 | mm | Month number (1-12) |
| DOW | PST1 | Sun-Sat | Day of week |
| Day | PST1 | dd | Day (1-31) |
| Latitude | Decimal degrees | nn.nnnnn | Site position, latitude |
| Longitude | Decimal degrees | nnn.nnnnn | Site position, longitude |
| Elevation | Feet | nnnn | Site elevation above sea level |
| County | N/A | String | County name where sensor is located |
| PM2.5_EPA | µg/m³ | nnn.n | PM2.5 concentration (EPA corrected) |
| PM2.5_Uncorr | µg/m³ | nnn.n | PM2.5 concentration (uncorrected) |
| Temp | F | nnn.n | Air sensor temperature; not ambient temperature |
| RH | % | nnn.n | Air sensor relative humidity; not ambient relative humidity |
| QC_Flags | 0-Valid 8-Invalid |
n | Quality control flag for entire record |
| QC_Descriptor | ‘Incomplete’ | String | Quality control descriptor for entire record |
1 begin time convention
| Field | Units | Format | Description |
|---|---|---|---|
| Site_ID | N/A | nnnnnn | Numeric site identifier |
| Date_Created | PST | yyyy-mm-dd HH:MM | Datetime when SiteID created |
| Site_Name | N/A | String | Text with the site name |
| Model | N/A | String | PurpleAir sensor model |
| Hardware | N/A | String | PurpleAir sensor hardware |
| Firmware_Version | N/A | String | PurpleAir sensor firmware version |
| Firmware_Update | N/A | String | PurpleAir sensor updated firmware version |
| Latitude | Decimal degrees | nn.nnnnn | Site position, latitude |
| Longitude | Decimal degrees | nnn.nnnnn | Site position, longitude |
| Elevation | Feet | nnnn | Site elevation above sea level (ft) |
| County | N/A | String | District name where sensor is located |
| Geohash_4 | N/A | String | Geographic region where sensor is located defined by geohash string of length 4 (tile: 1km x 12.5km) |
| Geohash_5 | N/A | String | Geographic region where sensor is located defined by geohash string of length 5 (4.9km x 4.9km) |
| Geohash_6 | N/A | String | Geographic region where sensor is located defined by geohash string of length 6 (1.2km x 609.4m) |
| Geohash_7 | N/A | String | Geographic region where sensor is located defined by geohash string of length 7 (152.9km x 152.4m) |
| Imposter_Sensors | TRUE or FALSE | Boolean | Identifies imposter PA sensors (new/unknown shareware circuitry) |
| Indoor_Sensors | TRUE or FALSE | Boolean | Identifies indoor PA sensors which are not included within verbose or final files |
| Problematic_Sensors | TRUE or FALSE | Boolean | Identifies problematic PA sensors which are not included within verbose or final files |
