The mudata2 package is designed to be used as little as possible. That is, if you need use data that is currently in mudata format, the functions in this package are designed to let you spend as little time as possible reading, subsetting, and inspecting your data. The steps are generally as follows:
read_mudata()summary(),
print(), distinct_locations(), and
distinct_params()tbl_locations() and
tbl_params()select_params() or
filter_params()select_locations() or
filter_locations()tbl_data() or
tbl_data_wide()In this vignette we will use the ns_climate dataset
within the mudata2 package, which is a collection of
monthly climate observations from Nova Scotia
(Canada), sourced from Environment Canada using the
rclimateca
package.
library(mudata2)
data("ns_climate")
ns_climate## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "dir_of_max_gust", "extr_max_temp" ... and 9 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param       date       value flag  flag_…¹
##   <chr>             <chr>             <chr>       <date>     <dbl> <chr> <chr>  
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-01-01    NA M     Missing
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-02-01    NA M     Missing
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-03-01    NA M     Missing
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-04-01    NA M     Missing
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-05-01    NA M     Missing
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-06-01    NA M     Missing
## # … with abbreviated variable name ¹flag_textThe ns_climate object is already an object in R, but if
it wasn’t, you would need to use read_mudata() to read it
in. If you’re curious what a mudata object looks like on disk, you could
try using write_mudata() to find out. I tend to prefer
writing to a directory rather than a JSON or ZIP file, but you can take
your pick.
# write to directory
write_mudata(ns_climate, "ns_climate.mudata")
# write to ZIP
write_mudata(ns_climate, "ns_climate.mudata.zip")
# write to JSON
write_mudata(ns_climate, "ns_climate.mudata.json")Then, you can read in the object using
read_mudata():
# read from directory
read_mudata("ns_climate.mudata")
# read from ZIP
read_mudata("ns_climate.mudata.zip")
# read from JSON
read_mudata("ns_climate.mudata.json")The three main ways to quickly inspect a mudata object are
print() and summary(). The
print() function is what you get when you type the name of
the object at the prompt, and gives a short summary of the object. The
output suggests a couple of other ways to inspect the object, including
distinct_locations(), which returns a character vector of
location identifiers, and distinct_params(), which returns
a character vector of parameter identifiers.
print(ns_climate)## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "dir_of_max_gust", "extr_max_temp" ... and 9 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param       date       value flag  flag_…¹
##   <chr>             <chr>             <chr>       <date>     <dbl> <chr> <chr>  
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-01-01    NA M     Missing
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-02-01    NA M     Missing
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-03-01    NA M     Missing
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-04-01    NA M     Missing
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-05-01    NA M     Missing
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-06-01    NA M     Missing
## # … with abbreviated variable name ¹flag_textThe summary() function provides some numeric summaries
by dataset, location, and parameter if the value column of
the data table is numeric (if it isn’t, it provides counts
instead).
summary(ns_climate)## # A tibble: 137 × 7
##    param           location             dataset      mean_…¹ sd_va…²     n  n_NA
##    <chr>           <chr>                <chr>          <dbl>   <dbl> <int> <int>
##  1 dir_of_max_gust SABLE ISLAND 6454    ecclimate_m…    19.8   10.2    299     0
##  2 extr_max_temp   ANNAPOLIS ROYAL 6289 ecclimate_m…    19.9    7.24   995    28
##  3 extr_max_temp   BADDECK 6297         ecclimate_m…    18.9    8.58   901    43
##  4 extr_max_temp   BEAVERBANK 6301      ecclimate_m…    17.2   10.4     24    17
##  5 extr_max_temp   COLLEGEVILLE 6329    ecclimate_m…    20.3    8.54  1061    34
##  6 extr_max_temp   DIGBY 6338           ecclimate_m…    19.0    6.92   624    20
##  7 extr_max_temp   KENTVILLE CDA 6375   ecclimate_m…    21.0    8.27  1002     3
##  8 extr_max_temp   MAHONE BAY 6396      ecclimate_m…    20.8    8.35   108    11
##  9 extr_max_temp   MOUNT UNIACKE 6413   ecclimate_m…    19.7    8.21   972    30
## 10 extr_max_temp   NAPPAN CDA 6414      ecclimate_m…    19.3    8.04  1121    19
## # … with 127 more rows, and abbreviated variable names ¹mean_value, ²sd_valueYou can have a look at the embedded documentation using
tbl_params(), and tbl_locations(), which
contain any additional information about parameters and locations for
which data are available. The identifiers (i.e., param and
location columns) of these can be used to subset the object
using select_*() functions; the tables themselves can be
used to subset the object using the filter_*()
functions.
# extract the parameters table
ns_climate %>% tbl_params()## # A tibble: 11 × 4
##    dataset           param              label                      unit    
##    <chr>             <chr>              <chr>                      <chr>   
##  1 ecclimate_monthly mean_max_temp      Mean Max Temp (C)          C       
##  2 ecclimate_monthly mean_min_temp      Mean Min Temp (C)          C       
##  3 ecclimate_monthly mean_temp          Mean Temp (C)              C       
##  4 ecclimate_monthly extr_max_temp      Extr Max Temp (C)          C       
##  5 ecclimate_monthly extr_min_temp      Extr Min Temp (C)          C       
##  6 ecclimate_monthly total_rain         Total Rain (mm)            mm      
##  7 ecclimate_monthly total_snow         Total Snow (cm)            cm      
##  8 ecclimate_monthly total_precip       Total Precip (mm)          mm      
##  9 ecclimate_monthly snow_grnd_last_day Snow Grnd Last Day (cm)    cm      
## 10 ecclimate_monthly dir_of_max_gust    Dir of Max Gust (10's deg) 10's deg
## 11 ecclimate_monthly spd_of_max_gust    Spd of Max Gust (km/h)     km/h# exract the locations table
ns_climate %>% tbl_locations()## # A tibble: 15 × 19
##    dataset    locat…¹ name  provi…² clima…³ stati…⁴ wmo_id tc_id latit…⁵ longi…⁶
##    <chr>      <chr>   <chr> <chr>   <chr>     <int>  <int> <chr>   <dbl>   <dbl>
##  1 ecclimate… ANNAPO… ANNA… NOVA S… 8200100    6289     NA ""       44.8   -65.5
##  2 ecclimate… BADDEC… BADD… NOVA S… 8200300    6297     NA ""       46.1   -60.8
##  3 ecclimate… BEAVER… BEAV… NOVA S… 8200550    6301     NA ""       44.9   -63.7
##  4 ecclimate… COLLEG… COLL… NOVA S… 8201000    6329     NA ""       45.5   -62.0
##  5 ecclimate… DIGBY … DIGBY NOVA S… 8201600    6338     NA ""       44.6   -65.8
##  6 ecclimate… KENTVI… KENT… NOVA S… 8202800    6375     NA ""       45.1   -64.5
##  7 ecclimate… MAHONE… MAHO… NOVA S… 8203300    6396     NA ""       44.5   -64.4
##  8 ecclimate… MOUNT … MOUN… NOVA S… 8203600    6413     NA ""       44.9   -63.8
##  9 ecclimate… NAPPAN… NAPP… NOVA S… 8203700    6414     NA ""       45.8   -64.2
## 10 ecclimate… PARRSB… PARR… NOVA S… 8204400    6428     NA ""       45.4   -64.3
## 11 ecclimate… PORT H… PORT… NOVA S… 8204480    6441     NA ""       45.6   -61.4
## 12 ecclimate… SABLE … SABL… NOVA S… 8204700    6454  71600 "ESA"    43.9   -60.0
## 13 ecclimate… ST MAR… ST M… NOVA S… 8204800    6456     NA ""       44.7   -63.9
## 14 ecclimate… SPRING… SPRI… NOVA S… 8205200    6473     NA ""       44.7   -64.8
## 15 ecclimate… UPPER … UPPE… NOVA S… 8206200    6495     NA ""       45.2   -63  
## # … with 9 more variables: elevation <dbl>, first_year <int>, last_year <int>,
## #   hly_first_year <int>, hly_last_year <int>, dly_first_year <int>,
## #   dly_last_year <int>, mly_first_year <int>, mly_last_year <int>, and
## #   abbreviated variable names ¹location, ²province, ³climate_id, ⁴station_id,
## #   ⁵latitude, ⁶longitudeYou can subset mudata objects using select_params() and
select_locations(), which use dplyr-like
selection syntax to quickly subset mudata objects using the identifiers
from distinct_locations() and
distinct_params() (respectively).
# find out which parameters are available
ns_climate %>% distinct_params()##  [1] "dir_of_max_gust"    "extr_max_temp"      "extr_min_temp"     
##  [4] "mean_max_temp"      "mean_min_temp"      "mean_temp"         
##  [7] "snow_grnd_last_day" "spd_of_max_gust"    "total_precip"      
## [10] "total_rain"         "total_snow"# subset by parameter
ns_climate %>% select_params(mean_temp, total_precip)## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "mean_temp", "total_precip"
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param     date       value flag  flag_text
##   <chr>             <chr>             <chr>     <date>     <dbl> <chr> <chr>    
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-01-01    NA M     Missing  
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-02-01    NA M     Missing  
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-03-01    NA M     Missing  
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-04-01    NA M     Missing  
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-05-01    NA M     Missing  
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_temp 1897-06-01    NA M     MissingYou can also use the dplyr select helpers to select related params/locations…
ns_climate %>% select_params(contains("temp"))## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "extr_max_temp", "extr_min_temp" ... and 3 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param       date       value flag  flag_…¹
##   <chr>             <chr>             <chr>       <date>     <dbl> <chr> <chr>  
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-01-01    NA M     Missing
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-02-01    NA M     Missing
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-03-01    NA M     Missing
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-04-01    NA M     Missing
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-05-01    NA M     Missing
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-06-01    NA M     Missing
## # … with abbreviated variable name ¹flag_text…and rename params/locations on the fly.
ns_climate %>% select_locations(Kentville = starts_with("KENT"))## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "Kentville"
##   distinct_params():    "extr_max_temp", "extr_min_temp" ... and 7 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location  param         date       value flag  flag_text
##   <chr>             <chr>     <chr>         <date>     <dbl> <chr> <chr>    
## 1 ecclimate_monthly Kentville mean_max_temp 1913-01-01  NA   M     Missing  
## 2 ecclimate_monthly Kentville mean_max_temp 1913-02-01  NA   M     Missing  
## 3 ecclimate_monthly Kentville mean_max_temp 1913-03-01  NA   M     Missing  
## 4 ecclimate_monthly Kentville mean_max_temp 1913-04-01   9.7 <NA>  <NA>     
## 5 ecclimate_monthly Kentville mean_max_temp 1913-05-01  12.5 <NA>  <NA>     
## 6 ecclimate_monthly Kentville mean_max_temp 1913-06-01  19.9 <NA>  <NA>To select params/locations based on the tbl_params() and
tbl_locations() tables, you can use the
filter_*() functions (note that last_year is a
column in tbl_locations(), and unit is a
column in tbl_params()):
# only use locations whose last data point was after 2000
ns_climate %>%
  filter_locations(last_year > 2000)## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "COLLEGEVILLE 6329" ... and 7 more
##   distinct_params():    "dir_of_max_gust", "extr_max_temp" ... and 9 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param       date       value flag  flag_…¹
##   <chr>             <chr>             <chr>       <date>     <dbl> <chr> <chr>  
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-01-01    NA M     Missing
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-02-01    NA M     Missing
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-03-01    NA M     Missing
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-04-01    NA M     Missing
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-05-01    NA M     Missing
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-06-01    NA M     Missing
## # … with abbreviated variable name ¹flag_text# use only params measured in mm
ns_climate %>%
  filter_params(unit == "mm")## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "total_precip", "total_rain"
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param      date       value flag  flag_t…¹
##   <chr>             <chr>             <chr>      <date>     <dbl> <chr> <chr>   
## 1 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-01-01  NA   M     Missing 
## 2 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-02-01  40.4 <NA>  <NA>    
## 3 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-03-01  32   <NA>  <NA>    
## 4 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-04-01 132.  <NA>  <NA>    
## 5 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-05-01  44.7 <NA>  <NA>    
## 6 ecclimate_monthly SABLE ISLAND 6454 total_rain 1891-06-01 106.  <NA>  <NA>    
## # … with abbreviated variable name ¹flag_textSimilarly, we can subset parameters, locations, and the data table
all at once using filter_data().
library(lubridate)## 
## Attaching package: 'lubridate'## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union# extract only June temperature from the data table
ns_climate %>%
  filter_data(month(date) == 6)## A mudata object aligned along "date"
##   distinct_datasets():  "ecclimate_monthly"
##   distinct_locations(): "ANNAPOLIS ROYAL 6289", "BADDECK 6297" ... and 13 more
##   distinct_params():    "dir_of_max_gust", "extr_max_temp" ... and 9 more
##   src_tbls():           "data", "locations" ... and 3 more
## 
## tbl_data() %>% head():
## # A tibble: 6 × 7
##   dataset           location          param       date       value flag  flag_…¹
##   <chr>             <chr>             <chr>       <date>     <dbl> <chr> <chr>  
## 1 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1897-06-01  NA   M     Missing
## 2 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1898-06-01  13.4 <NA>  <NA>   
## 3 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1899-06-01  14.4 <NA>  <NA>   
## 4 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1900-06-01  14.6 <NA>  <NA>   
## 5 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1901-06-01  15.3 <NA>  <NA>   
## 6 ecclimate_monthly SABLE ISLAND 6454 mean_max_t… 1902-06-01  13.6 <NA>  <NA>   
## # … with abbreviated variable name ¹flag_textThe data is stored in the data table (i.e., tbl_data())
in parameter-long form (that is, one row per measurement rather than one
row per observation). This has advantages in that information about each
measurement can be stored next to the value (e.g., standard deviation,
notes, etc.), however it is rarely the form required for analysis. To
extract data in parameter-long form, you can use
tbl_data():
ns_climate %>% tbl_data()## # A tibble: 115,541 × 7
##    dataset           location          param      date       value flag  flag_…¹
##    <chr>             <chr>             <chr>      <date>     <dbl> <chr> <chr>  
##  1 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-01-01  NA   M     Missing
##  2 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-02-01  NA   M     Missing
##  3 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-03-01  NA   M     Missing
##  4 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-04-01  NA   M     Missing
##  5 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-05-01  NA   M     Missing
##  6 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-06-01  NA   M     Missing
##  7 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-07-01  NA   M     Missing
##  8 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-08-01  NA   M     Missing
##  9 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-09-01  NA   M     Missing
## 10 ecclimate_monthly SABLE ISLAND 6454 mean_max_… 1897-10-01  12.2 <NA>  <NA>   
## # … with 115,531 more rows, and abbreviated variable name ¹flag_textTo extract data in a more standard parameter-wide form, you can use
tbl_data_wide():
ns_climate %>% tbl_data_wide()## # A tibble: 14,311 × 14
##    dataset    locat…¹ date       dir_o…² extr_…³ extr_…⁴ mean_…⁵ mean_…⁶ mean_…⁷
##    <chr>      <chr>   <date>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 ecclimate… ANNAPO… 1914-01-01      NA    NA      NA      NA      NA      NA  
##  2 ecclimate… ANNAPO… 1914-02-01      NA    NA      NA      NA      NA      NA  
##  3 ecclimate… ANNAPO… 1914-03-01      NA    NA      NA      NA      NA      NA  
##  4 ecclimate… ANNAPO… 1914-04-01      NA    19.4   -11.1     8.2    -3.1     2.6
##  5 ecclimate… ANNAPO… 1914-05-01      NA    30      -3.9    15.8     3.8     9.8
##  6 ecclimate… ANNAPO… 1914-06-01      NA    26.7    -1.7    19.8     7.2    13.5
##  7 ecclimate… ANNAPO… 1914-07-01      NA    30       3.9    22.3    10.2    16.3
##  8 ecclimate… ANNAPO… 1914-08-01      NA    NA      NA      NA      NA      NA  
##  9 ecclimate… ANNAPO… 1914-09-01      NA    NA      NA      NA      NA      NA  
## 10 ecclimate… ANNAPO… 1914-10-01      NA    NA      NA      NA      NA      NA  
## # … with 14,301 more rows, 5 more variables: snow_grnd_last_day <dbl>,
## #   spd_of_max_gust <dbl>, total_precip <dbl>, total_rain <dbl>,
## #   total_snow <dbl>, and abbreviated variable names ¹location,
## #   ²dir_of_max_gust, ³extr_max_temp, ⁴extr_min_temp, ⁵mean_max_temp,
## #   ⁶mean_min_temp, ⁷mean_tempThe tbl_data_wide() function isn’t limited to
parameter-wide data - data can be anything-wide (Edzer Pebesma has a great discussion
on this). Using tbl_data_wide() is identical to using
tbl_data() and tidyr::spread(), with
context-specific defaults.
ns_climate %>%
  select_params(mean_temp) %>%
  filter_data(year(date) == 1960) %>%
  tbl_data_wide(key = location)## # A tibble: 12 × 16
##    dataset      param date       BADDE…¹ COLLE…² DIGBY…³ KENTV…⁴ MAHON…⁵ MOUNT…⁶
##    <chr>        <chr> <date>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 ecclimate_m… mean… 1960-01-01    -3.8    -6      -2.6    -5.1    -5.7    -6.7
##  2 ecclimate_m… mean… 1960-02-01    -1.2    -2.5     0.3    -1.2    -2.3    -3.1
##  3 ecclimate_m… mean… 1960-03-01    -1.3    -3.1     0      -1.8    -2.3    -3.5
##  4 ecclimate_m… mean… 1960-04-01     3       2.1     6.5     4.8     4.7     3.1
##  5 ecclimate_m… mean… 1960-05-01    11.7    10.9    12.8    13.1    11.5    11.6
##  6 ecclimate_m… mean… 1960-06-01    14.4    14.7    16.4    17.2    16.2    15  
##  7 ecclimate_m… mean… 1960-07-01    17.1    18      18.9    19.4    18.2    17.2
##  8 ecclimate_m… mean… 1960-08-01    NA      18.5    18.6    19.6    18.9    18  
##  9 ecclimate_m… mean… 1960-09-01    15.2    14      14.8    15.1    15      13.1
## 10 ecclimate_m… mean… 1960-10-01     8.7     6.9     9.1     8.1     7.2     6.6
## 11 ecclimate_m… mean… 1960-11-01     4.6     3.2     6.7     4.9     4.1     3  
## 12 ecclimate_m… mean… 1960-12-01    -0.8    -3.5    -0.4    -2.4    NA      -4.3
## # … with 7 more variables: `NAPPAN CDA 6414` <dbl>, `PARRSBORO 6428` <dbl>,
## #   `PORT HASTINGS 6441` <dbl>, `SABLE ISLAND 6454` <dbl>,
## #   `SPRINGFIELD 6473` <dbl>, `ST MARGARET'S BAY 6456` <dbl>,
## #   `UPPER STEWIACKE 6495` <dbl>, and abbreviated variable names
## #   ¹`BADDECK 6297`, ²`COLLEGEVILLE 6329`, ³`DIGBY 6338`,
## #   ⁴`KENTVILLE CDA 6375`, ⁵`MAHONE BAY 6396`, ⁶`MOUNT UNIACKE 6413`Using the pipe (%>%), we can string all the steps
together concisely:
temp_1960 <- ns_climate %>%
  # pick parameters
  select_params(contains("temp")) %>%
  # pick locations
  select_locations(
    `Sable Island` = starts_with("SABLE"),
    `Kentville` = starts_with("KENT"),
    `Badeck` = starts_with("BADD")
  ) %>%
  # filter data table
  filter_data(year(date) == 1960) %>%
  # extract data in wide format
  tbl_data_wide()
temp_1960## # A tibble: 36 × 8
##    dataset           location date       extr_…¹ extr_…² mean_…³ mean_…⁴ mean_…⁵
##    <chr>             <chr>    <date>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 ecclimate_monthly Badeck   1960-01-01     8.9   -16.7    -0.6    -6.9    -3.8
##  2 ecclimate_monthly Badeck   1960-02-01     6.1   -13.3     1.7    -4.1    -1.2
##  3 ecclimate_monthly Badeck   1960-03-01     7.2    -9.4     0.9    -3.4    -1.3
##  4 ecclimate_monthly Badeck   1960-04-01    16.7    -7.8     6.1    -0.2     3  
##  5 ecclimate_monthly Badeck   1960-05-01    26.7     2.2    17.2     6.2    11.7
##  6 ecclimate_monthly Badeck   1960-06-01    30.6     0      19.6     9.2    14.4
##  7 ecclimate_monthly Badeck   1960-07-01    28.3     8.9    22.6    11.6    17.1
##  8 ecclimate_monthly Badeck   1960-08-01    33.3     8.9    24.3    NA      NA  
##  9 ecclimate_monthly Badeck   1960-09-01    25.6     4.4    19.8    10.6    15.2
## 10 ecclimate_monthly Badeck   1960-10-01    18.3    -0.6    12.3     5       8.7
## # … with 26 more rows, and abbreviated variable names ¹extr_max_temp,
## #   ²extr_min_temp, ³mean_max_temp, ⁴mean_min_temp, ⁵mean_tempWe can then use this data with ggplot2 to lead us to the conclusion that three locations in the same province had more or less the same monthly temperature characteristics in 1960.
library(ggplot2)
ggplot(
  temp_1960,
  aes(
    x = date,
    y = mean_temp,
    ymin = extr_min_temp,
    ymax = extr_max_temp,
    col = location,
    fill = location
  )
) +
  geom_ribbon(alpha = 0.2, col = NA) +
  geom_line()