Easy-to-use functions for downloading air quality data from the Mexican National Air Quality Information System (SINAICA). With this R package you can download pollution and meteorological parametrs from the more than a hundred monitoring stations located throughout Mexico. The package allows you to query for crude real-time air quality data, validated data, or manually collected data.
To install the most recent package version from CRAN type:
install.packages("rsinaica")
You can always install the development version from GitHub:
if (!require(devtools)) { install.packages("devtools") } devtools::install_github("diegovalle/rsinaica")
Suppose you wanted to download pollution data from the Centro station in Guadalajara. First, we load the necessary packages and look up the numeric code for the station in the stations_sinaica
data.frame:
## Auto-install required R packages packs <- c("ggplot2", "maps", "mapproj", "rsinaica") success <- suppressWarnings(sapply(packs, require, character.only = TRUE)) if (length(names(success)[!success])) { install.packages(names(success)[!success]) sapply(names(success)[!success], require, character.only = TRUE) } knitr::kable(stations_sinaica[which(stations_sinaica$station_name == "Centro"), 1:6])
station_id | station_name | station_code | network_id | network_name | network_code | |
---|---|---|---|---|---|---|
12 | 33 | Centro | CEN | 30 | Aguascalientes | AGS |
42 | 54 | Centro | CEN | 38 | Chihuahua | CHIH1 |
75 | 102 | Centro | CEN | 63 | Guadalajara | GDL |
It looks like there are three stations named Centro, the one we are looking for is the one in Guadalajara with a numeric code (station_id) of 102. The stations_sinaica
data.frame also includes the latitude and longitude of all the measuring stations in Mexico (including some that have never reported any data!).
mx <- map_data("world", "Mexico") stations_sinaica$color <- "Others" stations_sinaica$color[stations_sinaica$station_id == 102] <- "Centro (102)" ggplot(stations_sinaica[order(stations_sinaica$color, decreasing = TRUE),], aes(lon, lat)) + geom_polygon(data = mx, aes(x= long, y = lat, group = group)) + geom_point(alpha = .9, size = 3, aes(fill = color), shape = 21) + scale_fill_discrete("station") + ggtitle("Air quality measuring stations in Mexico") + coord_map() + theme_void()
Then we query the start and end dates for which SINAICA has received data from the station:
sinaica_station_dates(102) #> [1] "1997-01-01" "2020-07-31"
It’s currently reporting data (this document was built on 2020-07-31), and has been doing so since 1997. We can also query which type of parameters (pollution, wind, solar radiation, etc) the station has sensors for. Note that the package also includes a parameters
data.frame with the complete set of supported parameters, but not all stations support all of them.
cen_params <- sinaica_station_params(102) knitr::kable(cen_params)
param_code | param_name |
---|---|
CN | Carbono negro |
SO2 | Dióxido de azufre |
NO2 | Dióxido de nitrógeno |
DV | Dirección del viento |
HR | Humedad relativa |
CO | Monóxido de carbono |
NO | Óxido nítrico |
NOx | Óxidos de nitrógeno |
O3 | Ozono |
PM10 | Partículas menores a 10 micras |
PM2.5 | Partículas menores a 2.5 micras |
PP | Precipitación pluvial |
RS | Radiación solar |
TMPI | Temperatura interior |
VV | Velocidad del viento |
Finally, we can download and plot hourly concentrations of particulate matter with a diameter smaller than 10 micrometers (μm) (PM10) during the month of January.
# Download all PM10 data for January 2018 df <- sinaica_station_data(102, # station_id "PM10", # can be one of parameters$parameter_code "2018-01-01", "2018-01-31", # Maximum of one month "Crude" # Crude, Manual or Validated ) ggplot(df, aes(hour, value, group = date)) + geom_line(alpha=.9) + ggtitle(expression(paste(PM[10], " pollution during January 2018 in Centro, Guadalajara, by hour"))) + xlab("hour") + ylab(expression(paste(mu,"g/", m^3))) + theme_bw()
the hours are in the local Guadalajara time zone of UTC-6 since we plotted January data.
stations_sinaica$timezone[which(stations_sinaica$station_id == 102)] #> [1] "Tiempo del centro, UTC-6 (UTC-5 en verano)"
You can find a handy map of Mexico’s time zones from Wikipedia to help you with any time conversions you might need.