Query.RmdThe queryLipdverse() function in lipdR
allows users to download LiPD files based on various filter
parameters
The filtering relies on a query table, which holds metadata for all time series available on LiPDverse
This query table is built in to, and loaded with lipdR
library(lipdR)Let’s get a sense for what the table holds
dim(queryTable)
#> Error: object 'queryTable' not found
names(queryTable)
#> Error: object 'queryTable' not foundThat’s a lot of rows! There’s a row for every time series in the LiPDverse database
If you have used the LiPD format, you will recognize some of these names
Let’s try a simple query
We’ll set skip.update = FALSE for this demonstration,
but it’s a good idea to check for updates each time you start a new
session
If you tell queryLipdverse to skip.update
once, it will not prompt you again in a given R session
qt <- queryLipdverse(variable.name = c("δ18O"),
skip.update = TRUE)
#> Based on your query parameters, there are 0 available time series in 0 datasetshuh, there must be more Oxygen-18 time series
Let’s have a look at the unique paleoData
variable.name
We’ll just look at the first 250, there are quite a few
unique(queryTable$paleoData_variableName)[1:250]
#> Error: object 'queryTable' not foundThere’s quite a few notation styles, but they all have “18o” in common, so let’s try that
qt <- queryLipdverse(variable.name = c("18o"))
#> Based on your query parameters, there are 1861 available time series in 1208 datasetsNow that is a lot more data
Some query parameters will require considering what all the possible results have in common
In other cases, we can simply input a vector with several possible options
For the variable.name parameter, multiple entries are
combined with “OR” logic, so more entries will generally pull more
datasets
qt <- queryLipdverse(variable.name = c("δ18O", "d18O"))
#> Based on your query parameters, there are 1861 available time series in 1208 datasetsThis gets us almost the same result, but we see the simplified filter is still pulling one more dataset
Also note that capitalization makes no difference here
The archive.type filter works similarly, let’s look at
our options
unique(queryTable$archiveType)
#> Error: object 'queryTable' not foundNote that we have distinct options: “Lake”, “LakeSediment”, “LakeDeposits”, “LakeDeposit”, and “Lake Sediment”
We can grab the archive.type names and search for all of
them like this
qt <- queryLipdverse(archive.type = unique(queryTable$archiveType)[c(2,3,5,6,39)])
#> Error: object 'queryTable' not foundOr we can use the simplification strategy again
qt <- queryLipdverse(archive.type = c("lake"))
#> Based on your query parameters, there are 46371 available time series in 2375 datasetsAs we saw last time, the results are similar
We can also filter based on publications: Author, DOI, Title, etc.
using pub.info
qt <- queryLipdverse(pub.info = c("10.1016/j.quascirev.2008.09.005"))
#> Based on your query parameters, there are 20 available time series in 7 datasetsThis query can be a little slow if you don’t narrow the results with another parameter first
Let’s narrow our region of interest
There are four different parameters used for this:
coord, country, continent, and
ocean
Let’s grab all the North American datasets
qt <- queryLipdverse(continent = "North America")
#> Based on your query parameters, there are 47165 available time series in 1807 datasetsLet’s grab just those from Mexico
qt <- queryLipdverse(country = c("Mexico"))
#> Based on your query parameters, there are 524 available time series in 40 datasetsNote that the country and contient filters are not filtered based on LiPD content. A function in R uses the coordinates associated with the datasets to associate them with countries and the results can be unreliable near country borders
Again, we can see all of the options for country and
continent with unique()
We can also use latitude and longitude directly with a bounding box
qt <- queryLipdverse(coord = c(0,90,-180,-110))
#> Based on your query parameters, there are 13987 available time series in 784 datasetsWe can limit this to only marine data by setting ocean
to TRUE
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE)
#> Based on your query parameters, there are 1443 available time series in 101 datasetsThe ocean parameter works on the same basis as the
country and continent parameters and can be
unreliable in coastal areas
We can also grab all the data from a compilation, such
as the Western North America (WNAm)
qt <- queryLipdverse(compilation = c("wNAm"))
#> Based on your query parameters, there are 389 available time series in 182 datasetsor multiple compilations
qt <- queryLipdverse(compilation = c("wnam", "temp12k"))
#> Based on your query parameters, there are 1702 available time series in 793 datasetsthe compilation filter uses “OR” logic
Let’s try pulling datasets based on their
seasonality
We can pull summer, commonly defined as June, July, and August, for the Northern Hemisphere
seasonality input is taken as a list
qt <- queryLipdverse(continent = "North America",
seasonality = list("June", "July", "August"))
#> Based on your query parameters, there are 0 available time series in 0 datasetsOkay, so this must not be a good choice of format, let’s see how LiPD authors define their seasons
unique(queryTable$interpretation1_seasonality)
#> Error: object 'queryTable' not foundIt looks like numeric months are most common. Season names, “warm”/“cold”, and series of first-letter abbreviations (ie. JJA) are all common.
Knowing this, let’s try again
Items within a single list are treated as linked by “AND”, so an input of list(“June”, “July”, “August”) would filter for season data with ALL of these months
Multiple lists are treated as linked by “OR”, such that list(list(“June”), list(“July”), list(“August)) would filter for season data with ANY of these months
Let’s try to get all of the summer datasets by entering a few different notations
qt <- queryLipdverse(continent = "North America",
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")))
#> Based on your query parameters, there are 186 available time series in 147 datasetsThis returns quite a few datasets
From our look at all the unique seasonality entries, we
can see that this probably includes a lot of annual data too
Let’s exclude the annual and winter datasets by using
season.not
The input for for season.not works the same as
seasonality
qt <- queryLipdverse(continent = "North America",
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))
#> Based on your query parameters, there are 186 available time series in 147 datasetsNow we’ve narrowed it down to summer-specific datasets
Let’s look at interpretation variables now. These are the climate variables that may serve as a target for the proxy time series available
These variables are expressed in two different formats: interpretation variable and interpretation detail
Each of these variables has four possible interpretation slots
unique(queryTable$interp_Vars)
#> Error: object 'queryTable' not found
unique(queryTable$interp_Details)
#> Error: object 'queryTable' not foundLet’s look at some marine interp.vars in the northeast
Pacific
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.vars = c("SST", "upwelling", "SSS"))
#> Based on your query parameters, there are 22 available time series in 7 datasetsThat gives us just a few datasets
Perhaps we’ll have more luck with interp.details, which
is more standardized
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"))
#> Based on your query parameters, there are 65 available time series in 38 datasetslet’s see if we grab more using both
These inputs combine with “OR” logic, so we may gather more datasets by using both parameters
qt <- queryLipdverse(coord = c(0,90,-180,-110),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"))
#> Based on your query parameters, there are 87 available time series in 42 datasetslooks like we get a few extra datasets with this approach
Now that we know how to use our filters, let’s go for a strict filter
We’ll look for marine archives in the northeast Pacific, with interpretations related to marine climate variables in the summer months only
qt <- queryLipdverse(coord = c(0,90,-180,-110),
archive.type = c("marine", "ocean"),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"),
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))
#> Based on your query parameters, there are 10 available time series in 4 datasetsTo fine-tune your query, set verbose to TRUE to see which parameters have what effect on filtering
We’ll narrow the results further by adding author names to find within the publication info
qt <- queryLipdverse(coord = c(0,90,-180,-110),
archive.type = c("marine", "ocean"),
ocean = TRUE,
interp.details = c("sea@surface", "elNino"),
interp.vars = c("SST", "upwelling", "SSS"),
seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")),
pub.info = c("mix", "caissie"),
verbose = TRUE)
#> Series available before filtering: 104777
#>
#> Series remaining after coord filter: 13987
#>
#> Series remaining after marine filter: 1443
#>
#> Series remaining after continent filter: 1443
#>
#> Series remaining after country filter: 1443
#>
#> Series remaining after time filter: 1443
#>
#> Series remaining after paleo.proxy filter: 1443
#>
#> Series remaining after paleo.units filter: 1443
#>
#> Series remaining after archive.type filter: 802
#>
#> Series remaining after variable.name filter: 802
#>
#> Series remaining after interp.vars filter: 802
#>
#> Series remaining after interp.details filter: 82
#>
#> Series remaining after compilation filter: 82
#>
#> Series remaining after seasonality filter(s): 10
#>
#> Series remaining after pub.info filter: 0
#>
#> Based on your query parameters, there are 0 available time series in 0 datasetsWhen you’re satisfied with the query results, we can simply put the
filtered query table into the readLipd() function to
download the datasets
D <- readLipd(qt)
#> Error in value[[3L]](cond): Error: get_src_or_dst: Error in if (!dir.exists(path)) {: argument is of length zero