Getting public holidays by country

I’ve been recently working on a time series model to which I wanted to include the public holidays of Spain and Portugal. After trying different approaches I decided to move forward with prophet which, by the way, I strongly recommend it.

But this post comes to my mind because I’d like to tell some options we have to get the public holidays by country. I don’t want to go into details about the specificities of what a public holidays mean (regional or local ones are excluded in this analysis, for example).

The first thing I did was to search for an existing R package and I couldn’t find anything. As a colleague pointed me this package would likely suffer from a strong maintenance. However, as prophet has a built-in function to include holidays, I considered to look into the code and I found that the package provides a data.frame with the holidays from 1995 to 2044 for many countries (there are around 100 different country names but I think half of them are country codes).

For many purposes this dataframe would suffice but it’s weird for me to load prophet just for taking advantage of this data. So I decided to keep exploring and I found holidayapi.com which provides an API to access the data but I realized that the free account is limited so I didn’t deepen here.

Fortunately date.nager.at provides the same information but also an open API so with the following simple function we can access to the data:

library(httr)
library(dplyr)
library(magrittr)
library(purrr)

get_holidays <- function(country_code, year) {
  # Build URL
  url <- parse_url("http://date.nager.at")
  url$path <- paste0("api/v1/get/", country_code, "/", year)
  base_url <- build_url(url)
  
  # Get content from the site
  content_json <- content(GET(base_url))
  
  # Extract only relevant fields
  df <- map_df(content_json, extract, c("countryCode", "name", "date"))
  df
}

And the output:

get_holidays(country_code = "AT", year = 2019)
## # A tibble: 13 x 3
##    countryCode name                  date      
##    <chr>       <chr>                 <chr>     
##  1 AT          New Year's Day        2019-01-01
##  2 AT          Epiphany              2019-01-06
##  3 AT          Easter Monday         2019-04-22
##  4 AT          National Holiday      2019-05-01
##  5 AT          Ascension Day         2019-05-30
##  6 AT          Whit Monday           2019-06-10
##  7 AT          Corpus Christi        2019-06-20
##  8 AT          Assumption Day        2019-08-15
##  9 AT          National Holiday      2019-10-26
## 10 AT          All Saints' Day       2019-11-01
## 11 AT          Immaculate Conception 2019-12-08
## 12 AT          Christmas Day         2019-12-25
## 13 AT          St. Stephen's Day     2019-12-26

And for several years:

years <- c("2016", "2017", "2018", "2019")
map_df(years, function(x) get_holidays("AT", x))
## # A tibble: 52 x 3
##    countryCode name             date      
##    <chr>       <chr>            <chr>     
##  1 AT          New Year's Day   2016-01-01
##  2 AT          Epiphany         2016-01-06
##  3 AT          Easter Monday    2016-03-28
##  4 AT          National Holiday 2016-05-01
##  5 AT          Ascension Day    2016-05-05
##  6 AT          Whit Monday      2016-05-16
##  7 AT          Corpus Christi   2016-05-26
##  8 AT          Assumption Day   2016-08-15
##  9 AT          National Holiday 2016-10-26
## 10 AT          All Saints' Day  2016-11-01
## # … with 42 more rows

After all, I still find two main drawbacks.

  1. I haven’t analyzed the data quality and I don’t know exactly if someone is maintaining this website.
  2. It’d be interesting to include regional and local holidays and, in addition, a label with relevant days (Black Friday, for example).

Does anyone have a better approach?