Search and download data from the Swiss Federal Statistical Office
The BFS
package allows to search and download public
data from the Swiss Federal Statistical Office (BFS stands for
Bundesamt für Statistik in German) in a dynamic and
reproducible way.
install.packages("BFS")
You can also install the development version from Github.
::install_github("lgnbhl/BFS") devtools
library(BFS)
To search and download data from the Swiss Federal Statistical Office, you first need to retrieve information about the available public datasets.
You can get the data catalog by language based on the official RSS feed. Unfortunately, it seems that not the all public datasets are in the RSS feed, but only the most recently udpated. Note also that Italian and English give access to less datasets.
<- bfs_get_catalog_data(language = "en")
catalog_data_en
catalog_data_en
## # A tibble: 179 × 5
## title langu…¹ publi…² url_bfs url_px
## <chr> <chr> <chr> <chr> <chr>
## 1 Businesses by difficulties in recruiting staf… en Busine… https:… https…
## 2 Businesses by difficulties in recruiting staf… en Busine… https:… https…
## 3 Businesses by employment prospects and econom… en Busine… https:… https…
## 4 Businesses by employment prospects and major … en Busine… https:… https…
## 5 Job vacancies by economic divisions (selectio… en Job va… https:… https…
## 6 Job vacancies by major region en Job va… https:… https…
## 7 Jobs by economic division, employment rate an… en Jobs b… https:… https…
## 8 Jobs by major region, economic sector, employ… en Jobs b… https:… https…
## 9 Hotel accommodation: arrivals and overnight s… en Hotel … https:… https…
## 10 Hotel accommodation: arrivals and overnight s… en Hotel … https:… https…
## # … with 169 more rows, and abbreviated variable names ¹language, ²published
To find older datasets, you can use the search bar in the official BFS website.
You could use for example dplyr
to search for a given
dataset.
library(dplyr)
<- catalog_data_en %>%
catalog_data_uni filter(title == "University students by year, ISCED field, sex and level of study")
catalog_data_uni
## # A tibble: 1 × 5
## title langu…¹ publi…² url_bfs url_px
## <chr> <chr> <chr> <chr> <chr>
## 1 University students by year, ISCED field, sex … en Univer… https:… https…
## # … with abbreviated variable names ¹language, ²published
To download a BFS dataset, you have two options. You can add the
official BFS URL webpage to the url_bfs
argument to the
bfs_get_data()
. For example, you can use the URL of a given
dataset you found using bfs_get_catalog_data()
.
# https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.16324907.html
<- bfs_get_data(url_bfs = catalog_data_uni$url_bfs, language = "en") df_uni
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
df_uni
## # A tibble: 17,640 × 5
## Year `ISCED Field` Sex `Level of study` Unive…¹
## <chr> <chr> <chr> <chr> <dbl>
## 1 1980/81 Education science Male First university degree or diploma 545
## 2 1980/81 Education science Male Bachelor 0
## 3 1980/81 Education science Male Master 0
## 4 1980/81 Education science Male Doctorate 93
## 5 1980/81 Education science Male Further education, advanced studies… 13
## 6 1980/81 Education science Female First university degree or diploma 946
## 7 1980/81 Education science Female Bachelor 0
## 8 1980/81 Education science Female Master 0
## 9 1980/81 Education science Female Doctorate 70
## 10 1980/81 Education science Female Further education, advanced studies… 52
## # … with 17,630 more rows, and abbreviated variable name ¹`University students`
Note that some datasets are only accessible in German and French.
In case the data is not accessible using
bfs_get_catalog_data()
, you can manually add the BFS number
in the bfs_get_data()
function using the
number_bfs
argument.
# open webpage
browseURL("https://www.bfs.admin.ch/content/bfs/en/home/statistiken/kataloge-datenbanken/daten.assetdetail.16324907.html")
Use again bfs_get_data()
but this time with the
number_bfs
argument.
bfs_get_data(number_bfs = "px-x-1502040100_131", language = "en")
Please privilege the number_bfs
argument of the
bfs_get_data()
if you want more stable and reproducible
code.
You can access additional information about the dataset by running
bfs_get_data_comments()
.
bfs_get_data_comments(number_bfs = "px-x-1502040100_131", language = "en")
## Downloading large query (in 4 batches):
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
## # A tibble: 1 × 4
## row_no col_no comment_type comment
## <int> <int> <chr> <chr>
## 1 NA 4 column_comment "To ensure that the presentations from cubes con…
You may get an error message if the dataset is too big.
Error in pxweb_advanced_get(url = url, query = query, verbose = verbose) :
Too Many Requests (RFC 6585) (HTTP 429).
One solution is too query only specific elements of the dataset to download less data. Here an example.
First you want to get the variable names, i.e. code
, and
categories, i.e. values
, of your dataset.
# choose a BFS number and language
<- "px-x-1502040100_131"
number_bfs <- "en"
language # create the BFS api url
<- paste0("https://www.pxweb.bfs.admin.ch/api/v1/",
pxweb_api_url "/", number_bfs, "/", number_bfs, ".px")
language, # Get BFS table metadata using {pxweb}
<- pxweb::pxweb_get(pxweb_api_url)
px_meta # list variables items
str(px_meta$variables)
## List of 4
## $ :List of 6
## ..$ code : chr "Jahr"
## ..$ text : chr "Year"
## ..$ values : chr [1:42] "0" "1" "2" "3" ...
## ..$ valueTexts : chr [1:42] "1980/81" "1981/82" "1982/83" "1983/84" ...
## ..$ time : logi TRUE
## ..$ elimination: logi FALSE
## $ :List of 6
## ..$ code : chr "ISCED Fach"
## ..$ text : chr "ISCED Field"
## ..$ values : chr [1:42] "0" "1" "2" "3" ...
## ..$ valueTexts : chr [1:42] "Education science" "Teacher training without subject specialisation" "Teacher training with subject specialisation" "Fine arts" ...
## ..$ elimination: logi TRUE
## ..$ time : logi FALSE
## $ :List of 6
## ..$ code : chr "Geschlecht"
## ..$ text : chr "Sex"
## ..$ values : chr [1:2] "0" "1"
## ..$ valueTexts : chr [1:2] "Male" "Female"
## ..$ elimination: logi TRUE
## ..$ time : logi FALSE
## $ :List of 6
## ..$ code : chr "Studienstufe"
## ..$ text : chr "Level of study"
## ..$ values : chr [1:5] "0" "1" "2" "3" ...
## ..$ valueTexts : chr [1:5] "First university degree or diploma" "Bachelor" "Master" "Doctorate" ...
## ..$ elimination: logi TRUE
## ..$ time : logi FALSE
Then you can manually select the dimensions of the dataset you want to query.
# Manually create BFS query dimensions
# Use `code` and `values` elements in `px_meta$variables`
# Use "*" to select all
<- list(
dimensions "Jahr" = c("40", "41"),
"ISCED Fach" = c("0"),
"Geschlecht" = c("0", "1"),
"Studienstufe" = c("2", "3"))
# Query BFS data with specific dimensions
::bfs_get_data(
BFSnumber_bfs = number_bfs,
language = language,
query = dimensions
)
## # A tibble: 8 × 5
## Year `ISCED Field` Sex `Level of study` `University students`
## <chr> <chr> <chr> <chr> <dbl>
## 1 2020/21 Education science Male Master 151
## 2 2020/21 Education science Male Doctorate 121
## 3 2020/21 Education science Female Master 555
## 4 2020/21 Education science Female Doctorate 306
## 5 2021/22 Education science Male Master 143
## 6 2021/22 Education science Male Doctorate 115
## 7 2021/22 Education science Female Master 599
## 8 2021/22 Education science Female Doctorate 318
A lot of tables are not accessible through the official API, but they
are still present in the official BFS website. You can access the RSS
feed tables catalog using bfs_get_catalog_tables()
.
Most of these tables are Excel or CSV files. Note again that only a part
of all the public tables accessible are in the RSS feed (the most
recently updated datasets).
<- bfs_get_catalog_tables(language = "en")
catalog_tables_en
catalog_tables_en
## # A tibble: 350 × 5
## title langu…¹ publi…² url_bfs url_t…³
## <chr> <chr> <chr> <chr> <chr>
## 1 "Civil aviation – overview" en "Civil… https:… https:…
## 2 "Difficulties in recruiting staff with educa… en "Diffi… https:… https:…
## 3 "Difficulties in recruiting staff with educa… en "Diffi… https:… https:…
## 4 "Difficulties in recruiting staff with highe… en "Diffi… https:… https:…
## 5 "Difficulties in recruiting staff with unive… en "Diffi… https:… https:…
## 6 "Full-time job equivalent per sector" en "Full-… https:… https:…
## 7 "Full-time job per sector and gender" en "Full-… https:… https:…
## 8 "Index of employment evolution prospects per… en "Index… https:… https:…
## 9 "Job vacancy per branch of economic activity… en "Job v… https:… https:…
## 10 "Jobs per sector and gender, gross and seaso… en "Jobs … https:… https:…
## # … with 340 more rows, and abbreviated variable names ¹language, ²published,
## # ³url_table
library(dplyr)
library(openxlsx)
<- catalog_tables_en %>%
index_table_url filter(grepl("index", title)) %>% # search table
slice(1) %>%
pull(url_table)
<- tryCatch(expr = openxlsx::read.xlsx(index_table_url, startRow = 1),
df error = function(e) "Failed reading table") %>%
as_tibble()
df
## # A tibble: 33 × 17
## Sprache./…¹ X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 "Indexwert… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 "Schweizer… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 "Totalinde… Q1 2… Q2 2… Q3 2… Q4 2… Q1 2… Q2 2… Q3 2… Q4 2… Q1 2… Q2 2… Q3 2…
## 4 "Total " 97.6… 98.1… 98.2… 100 99.2… 100.… 100.… 103.… 103.… 105.… 107.…
## 5 "GemeindeT… 95.5… 97.7… 96.9… 100 98.8… 100.… 101.… 102.… 103.… 105.… 108.…
## 6 "GemeindeT… 98.1… 98.7… 100.… 100 99.2… 100.… 101.… 103.… 103.… 104.… 106.…
## 7 "GemeindeT… 97.7… 99.1… 98.0… 100 98.3… 99.8… 100.… 102.… 103.… 106.… 107.…
## 8 "GemeindeT… 99.6… 98.3… 98.7… 100 100.… 100.… 99.9… 102.… 103.… 105.… 107.…
## 9 "GemeindeT… 97.7… 97.1… 97.5… 100 99.1… 99.8… 100.… 103.… 101.… 105.… 107.…
## 10 "EFH" 97.0… 98.2… 98.5… 100 99.4… 100.… 101.… 103.… 103.… 105.… 108.…
## # … with 23 more rows, 5 more variables: X13 <chr>, X14 <chr>, X15 <chr>,
## # X16 <chr>, X17 <chr>, and abbreviated variable name
## # ¹`Sprache./.Langue./.Lingua./.Language`
Under the hood, this package is using extensively the pxweb R package to query the Swiss Federal Statistical Office PXWEB API. PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way.
The list of available datasets is accessed using the tidyRSS R package to scrap the official BFS RSS feed.
You can clean the column names of the datasets automatically using
janitor::clean_names()
by adding the argument
clean_names = TRUE
in the bfs_get_data()
function.
A blog article showing a concrete example about how to use the BFS package and to visualize the data in a Swiss map.
This package is in no way officially related to or endorsed by the Swiss Federal Statistical Office (BFS).
Any contribution is strongly appreciated. Feel free to report a bug, ask any question or make a pull request for any remaining issue.