`vignettes/articles/wsp_error.Rmd`

`wsp_error.Rmd`

There’s an error in the 2016 wind speed data, they were incorrectly converted from m/s to miles per hour (mph).

First we install the necessary packages:

```
# based on https://cran.r-project.org/web/packages/ggjoy/vignettes/gallery.html
## Auto-install required R packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(aire.zmvm, dplyr, ggplot2, functional, lubridate)
```

Next, we download and plot the data, if we use the function `get_station_data`

the data will be automatically corrected. Note that the number of stations has increased over time and thus there will be more variation in later years.

```
df <- do.call(rbind,
lapply(1986:2018, Curry(get_station_data, "MAXIMOS", "WSP"))
) %>%
filter(value < 80)
ggplot(na.omit(df), aes(date, value)) +
geom_point(alpha = .01) +
geom_smooth(method = 'gam', formula = y ~ s(x)) +
theme_bw() +
ylab("wind speed (m/s)") +
ggtitle("Maximum Daily Wind Speed in Mexico City 1986-2018 (get_station_data)")
```

but if we use the `download_meteorological`

function there will be a warning about the wind speed data being incorrect (the data also includes temperature, wind direction, and relative humidity so I can’t just throw an error when downloading.)

```
df <- download_meteorological(2015:2018) %>%
filter(pollutant == "WSP") %>%
group_by(date, station_code) %>%
summarise(max = ifelse(all(is.na(value)),
NA,
base::max(value, na.rm = TRUE))) %>%
na.omit()
```

```
## Warning: There may be errors in the 2016 wind speed data. It was
## incorrectly converted to mph. Use the function `get_station_data` to
## download the correct values
```

```
ggplot(na.omit(df), aes(date, max)) +
geom_point(alpha = .01) +
geom_smooth(method = 'gam', formula = y ~ s(x)) +
theme_bw() +
ylab("wind speed (m/s)") +
ggtitle("Maximum Daily Wind Speed in Mexico City 2015-2018 (download_meteorological)")
```

If we treat the 2016 values as if they were incorrectly converted from m/s to miles per hour (mph) and multiply by 2.2 to correct them, then the 2016 wind speed data looks correct, although it still looks a bit different due to rounding.

```
df$max[year(df$date) %in% 2016] <- df$max[year(df$date) %in% 2016] * 2.23694
ggplot(na.omit(df), aes(date, max)) +
geom_point(alpha = .01) +
geom_smooth(method = 'gam', formula = y ~ s(x)) +
theme_bw() +
ylab("wind speed (m/s)") +
ggtitle("Maximum Daily Wind Speed in Mexico City (2016 data multiplied by 2.2)")
```