dataRetrieval package was created to simplify the
process of loading hydrologic data into the R environment. It is
designed to retrieve the major data types of U.S. Geological Survey
(USGS) hydrologic data that are available on the Web, as well as data
from the Water Quality Portal (WQP), which currently houses water
quality data from the Environmental Protection Agency (EPA), U.S.
Department of Agriculture (USDA), and USGS. Direct USGS data is obtained
from a service called the National Water Information System (NWIS).
For information on getting started in R and installing the package, see Getting Started. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
A quick workflow for USGS
library(dataRetrieval) # Choptank River near Greensboro, MD <- "01491000" siteNumber <- readNWISsite(siteNumber) ChoptankInfo <- "00060" parameterCd # Raw daily data: <- readNWISdv( rawDailyData siteNumber, parameterCd,"1980-01-01", "2010-01-01" ) # Sample data Nitrate: <- "00618" parameterCd <- readNWISqw( qwData siteNumber, parameterCd,"1980-01-01", "2010-01-01" ) <- readNWISpCode(parameterCd)pCode
USGS data are made available through the National Water Information System (NWIS).
Table 1 describes the functions available in the
|readNWISdata||Data using user-specified queries||opt.||opt.||opt.||service, tz=‘UTC’, …||NWIS|
|readNWISqw||Water quality||req.||req.||req.||expanded=TRUE, tz=‘UTC’||NWIS|
|readNWISrating||Rating table for active streamgage||req.||type=’base||NWIS|
|readNWISuse||Water use||stateCd, countyCd, years=‘ALL’, categories=‘ALL’||NWIS|
|readNWISstat||Statistical service||req.||req.||req.||statReportType=‘daily’, statType=‘mean’||NWIS|
|readNWISpCode||Parameter code information||req.||req.||NWIS|
|whatNWISsites||Site search using user-specified queries||req.||…||NWIS|
|whatNWISdata||Data availability||opt.||opt.||service, …||NWIS|
|readWQPqw||Water quality data||req.||req.||req.||WQP|
defaults to request the maximum data.
In this section, examples of National Water Information System (NWIS) retrievals show how to get raw data into R. This data includes site information, measured parameter information, historical daily values, unit values (which include real-time data but can also include other sensor data stored at regular time intervals), water quality data, groundwater level data, peak flow data, rating curve data, surface-water measurement data, water use data, and statistics data. The section Embedded Metadata shows instructions for getting metadata that is attached to each returned data frame.
The USGS organizes hydrologic data in a standard structure.
Streamgages are located throughout the United States, and each
streamgage has a unique ID (referred in this document and throughout the
dataRetrieval package as
(but not always), these ID’s are 8 digits for surface-water sites and 15
digits for groundwater sites. The first step to finding data is
siteNumber. There are many ways to do
this, one is the National Water
Information System: Mapper.
siteNumber is known, the next required input
for USGS data retrievals is the “parameter code”. This is a 5-digit code
that specifies the measured parameter being requested. For example,
parameter code 00631 represents “Nitrate plus nitrite, water, filtered,
milligrams per liter as nitrogen”, with units of “mg/l as N”.
Not every station will measure all parameters. A short list of commonly measured parameters is shown in Table 2.
|00065||Gage height [ft]|
Two output columns that may not be obvious are “srsname” and “casrn”. Srsname stands for “Substance Registry Services”. More information on the srs name can be found here.
Casrn stands for “Chemical Abstracts Service (CAS) Registry Number”. More information on CAS can be found here.
For unit values data (sensor data measured at regular time intervals
such as 15 minutes or hourly), knowing the parameter code and
siteNumber is enough to make a request for data. For most
variables that are measured on a continuous basis, the USGS also stores
the historical data as daily values. These daily values are statistical
summaries of the continuous data, e.g. maximum, minimum, mean, or
median. The different statistics are specified by a 5-digit statistics
Some common codes are shown in Table 3.
Examples for using these site numbers, parameter codes, and statistic codes will be presented in subsequent sections.
There are occasions where NWIS values are not reported as numbers,
instead there might be text describing a certain event such as “Ice”.
Any value that cannot be converted to a number will be reported as NA in
this package (not including remark code columns), unless the user sets
FALSE. In that
case, the data is returned as a data frame that is entirely character
readNWISsite function to obtain all of the
information available for a particular USGS site (or sites) such as full
station name, drainage area, latitude, and longitude.
readNWISsite can also access information about multiple
sites with a vector input.
<- c("01491000", "01645000") siteNumbers <- readNWISsite(siteNumbers)siteINFO
Site information is obtained from: https://waterservices.usgs.gov/rest/Site-Test-Tool.html
Information on the returned data can be found with the
comment function as described in the Metadata section.
To discover what data is available for a particular USGS site,
including measured parameters, period of record, and number of samples
(count), use the
whatNWISdata function. It is possible to
limit the retrieval information to a subset of services. The possible
choices for services are: “dv” (daily values), “uv”, or “iv” (unit
values), “qw” (water-quality), “sv” (sites visits), “pk” (peak
measurements), “gw” (groundwater levels), “ad” (sites included in USGS
Annual Water Data Reports External Link), “aw” (sites monitored by the
USGS Active Groundwater Level Network External Link), and “id”
(historical instantaneous values).
In the following example, we limit the retrieved data to only daily
data. The default for “service” is
all, which returns all
of the available data for that site. Likewise, there are arguments for
parameter code (
parameterCd) and statistic code
statCd) to filter the results. The default for both is to
return all possible values (
all). The returned
count_nu for “uv” data is the count of days with returned
data, not the actual count of returned values.
# Continuing from the previous example: # This pulls out just the daily, mean data: <- whatNWISdata( dailyDataAvailable siteNumber = siteNumbers, service = "dv", statCd = "00003" )
|01491000||Temperature, water||2010-10-01||2012-05-09||529||deg C|
|01491000||Stream flow, mean daily||1948-01-01||2017-05-17||25340||ft3/s|
|01645000||Stream flow, mean daily||1930-09-26||2017-05-17||31646||ft3/s|
|01491000||Specific conductance||2010-10-01||2012-05-09||527||uS/cm @25C|
|01491000||Suspended sediment concentration (SSC)||1980-10-01||1991-09-30||4017||mg/l|
|01491000||Suspended sediment discharge||1980-10-01||1991-09-30||4017||tons/day|
See Creating Tables for instructions on converting an R data frame to a table in Microsoft® software Excel or Word to display a data availability table similar to Table 4. Excel, Microsoft, PowerPoint, Windows, and Word are registered trademarks of Microsoft Corporation in the United States and other countries.
To obtain all of the available information concerning a measured
parameter (or multiple parameters), use the
# Using defaults: <- "00618" parameterCd <- readNWISpCode(parameterCd)parameterINFO
To obtain daily records of USGS data, use the
function. The arguments for this function are
statCd (defaults to “00003”). If you want to use the
default values, you do not need to list them in the function call. Daily
data is pulled from https://waterservices.usgs.gov/rest/DV-Test-Tool.html.
The dates (start and end) must be in the format “YYYY-MM-DD” (note: the user must include the quotes). Setting the start date to “” (no space) will prompt the program to ask for the earliest date, and setting the end date to “” (no space) will prompt for the latest available date.
# Choptank River near Greensboro, MD: <- "01491000" siteNumber <- "00060" # Discharge parameterCd <- "2009-10-01" startDate <- "2012-09-30" endDate <- readNWISdv(siteNumber, parameterCd, startDate, endDate)discharge
The column “datetime” in the returned data frame is automatically imported as a variable of class “Date” in R. Each requested parameter has a value and remark code column. The names of these columns depend on the requested parameter and stat code combinations. USGS daily value qualification codes are often “A” (approved for publication) or “P” (provisional data subject to revision).
Another example would be a request for mean and maximum daily temperature and discharge in early 2012:
<- "01491000" siteNumber <- c("00010", "00060") # Temperature and discharge parameterCd <- c("00001", "00003") # Mean and maximum statCd <- "2012-01-01" startDate <- "2012-05-01" endDate <- readNWISdv(siteNumber, parameterCd, temperatureAndFlow startDate, endDate,statCd = statCd )
The column names can be shortened and simplified using the
renameNWISColumns function. This is not necessary, but may
streamline subsequent data analysis and presentation. Site information,
daily statistic information, and measured parameter information is
attached to the data frame as attributes. This is discussed further in
the metadata section.
##  "agency_cd" "site_no" "Date" ##  "X_00010_00001_cd" "X_00010_00001" "X_00010_00003_cd" ##  "X_00010_00003" "X_00060_00003_cd" "X_00060_00003"
<- renameNWISColumns(temperatureAndFlow) temperatureAndFlow names(temperatureAndFlow)
##  "agency_cd" "site_no" "Date" ##  "Wtemp_Max_cd" "Wtemp_Max" "Wtemp_cd" ##  "Wtemp" "Flow_cd" "Flow"
# Information about the data frame attributes: names(attributes(temperatureAndFlow))
##  "names" "row.names" "url" ##  "siteInfo" "variableInfo" "disclaimer" ##  "statisticInfo" "queryTime" "class"
<- attr(temperatureAndFlow, "statisticInfo") statInfo <- attr(temperatureAndFlow, "variableInfo") variableInfo <- attr(temperatureAndFlow, "siteInfo")siteInfo
An example of plotting the above data:
<- attr(temperatureAndFlow, "variableInfo") variableInfo <- attr(temperatureAndFlow, "siteInfo") siteInfo par(mar = c(5, 5, 5, 5)) # sets the size of the plot window plot(temperatureAndFlow$Date, temperatureAndFlow$Wtemp_Max, ylab = variableInfo$parameter_desc, xlab = "" )par(new = TRUE) plot(temperatureAndFlow$Date, $Flow, temperatureAndFlowcol = "red", type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", axes = FALSE )axis(4, col = "red", col.axis = "red") mtext(variableInfo$parameter_desc, side = 4, line = 3, col = "red") title(paste(siteInfo$station_nm, "2012")) legend("topleft", variableInfo$param_units, col = c("black", "red"), lty = c(NA, 1), pch = c(1, NA) )
Any data collected at regular time intervals (such as 15-minute or
hourly) are known as “unit values”. Many of these are delivered on a
real time basis and very recent data (even less than an hour old in many
cases) are available through the function
of these unit values are available for many years, and some are only
available for a recent time period such as 120 days. Here is an example
of a retrieval of such data.
<- "00060" # Discharge parameterCd <- "2012-05-12" startDate <- "2012-05-13" endDate <- readNWISuv(siteNumber, parameterCd, startDate, endDate) dischargeUnit <- renameNWISColumns(dischargeUnit)dischargeUnit
The retrieval produces a data frame that contains 96 rows (one for
every 15 minute period in the day). They include all data collected from
startDate through the
and ending with midnight locally-collected time). The dateTime column is
converted to UTC (Coordinated Universal Time), so midnight EST will be 5
hours earlier in the dateTime column (the previous day, at 7pm).
To override the UTC timezone, specify a valid timezone in the tz argument. Default is ““, which will keep the dateTime column in UTC. Other valid timezones are:
America/New_York America/Chicago America/Denver America/Los_Angeles America/Anchorage America/Honolulu America/Jamaica America/Managua America/Phoenix America/Metlakatla
Data are retrieved from https://waterservices.usgs.gov/rest/IV-Test-Tool.html. There are occasions where NWIS values are not reported as numbers, instead a common example is “Ice”. Any value that cannot be converted to a number will be reported as NA in this package. Site information and measured parameter information is attached to the data frame as attributes. This is discussed further in metadata section.
Groundwater level measurements can be obtained with the
readNWISgwl function. Information on the returned data can
be found with the
comment function, and attached attributes
as described in the metadata
<- "434400121275801" siteNumber <- readNWISgwl(siteNumber)groundWater
Peak flow data are instantaneous discharge or stage data that record
the maximum values of these variables during a flood event. They include
the annual peak flood event but can also include records of other peaks
that are lower than the annual maximum. Peak discharge measurements can
be obtained with the
readNWISpeak function. Information on
the returned data can be found with the
and attached attributes as described in the metadata section.
<- "01594440" siteNumber <- readNWISpeak(siteNumber)peakData
Rating curves are the calibration curves that are used to convert
measurements of stage to discharge. Because of changing hydrologic
conditions these rating curves change over time. Information on the
returned data can be found with the
comment function and
attached attributes as described in the metadata section.
Rating curves can be obtained with the
<- readNWISrating(siteNumber, "base") ratingData attr(ratingData, "RATING")
These data are the discrete measurements of discharge that are made
for the purpose of developing or revising the rating curve. Information
on the returned data can be found with the
and attached attributes as described in the metadata section.
Surface-water measurement data can be obtained with the
Retrieves water use data from USGS Water Use Data for the Nation. See https://waterdata.usgs.gov/nwis/wu for more information. All available use categories for the supplied arguments are retrieved.
<- readNWISuse( allegheny stateCd = "Pennsylvania", countyCd = "Allegheny" ) <- readNWISuse( national stateCd = NULL, countyCd = NULL, transform = TRUE )
Retrieves site statistics from the USGS Statistics Web Service beta.
<- readNWISstat( discharge_stats siteNumbers = c("02319394"), parameterCd = c("00060"), statReportType = "annual" )
Water quality data sets available from the Water Quality Data Portal.
These data sets can be housed in either the STORET database (data from
EPA), NWIS database (data from USGS), STEWARDS database (data from
USDA), and additional databases are slated to be included in the future.
Because only USGS uses parameter codes, a “characteristic name” must be
readWQPqw function can take either a USGS
parameter code, or a more general characteristic name in the parameterCd
input argument. The Water Quality Data Portal includes data discovery
tools and information on characteristic names. The following example
retrieves specific conductance from a DNR site in Wisconsin.
<- readWQPqw( specificCond "WIDNR_WQX-10032762", "Specific conductance", "2011-05-01", "2011-09-30" )
The previous examples all took specific input arguments:
parameterCd (or characteristic
endDate, etc. However, the
Web services that supply the data can accept a wide variety of
whatNWISsites can be used to discover NWIS
sites based on any query that the NWIS Site Service offers. This is done
by using the
... argument, which allows the user to use any
arbitrary input argument. We can then use the service here
to discover many options for searching for NWIS sites. For example, you
may want to search for sites in a lat/lon bounding box, or only sites
tidal streams, or sites with water quality samples, sites above a
certain altitude, etc. The results of this site query generate a URL.
For example, the tool provided a search within a specified bounding box,
for sites that have daily discharge (parameter code = 00060) and
temperature (parameter code = 00010). The generated URL is:
dataRetrieval code can be used to get
<- whatNWISsites( sites bBox = c(-83.0, 36.5, -81.0, 38.5), parameterCd = c("00010", "00060"), hasDataTypeCd = "dv" )
For NWIS data, the function
readNWISdata can be used.
The argument listed in the R help file is
service (only for data requests). Table 5 describes the
services are available.
|measurements||Surface Water Measurements||https://waterdata.usgs.gov/nwis/measurements/|
... argument allows the user to create their own
queries based on the instructions found in the web links above. The
links provide instructions on how to create a URL to request data.
Perhaps you want sites only in Wisconsin, with a drainage area less than
50 mi2, and the most recent daily discharge data. That
request would be done as follows:
<- readNWISdata( dischargeWI service = "dv", stateCd = "WI", parameterCd = "00060", drainAreaMin = "50", statCd = "00003" ) <- attr(dischargeWI, "siteInfo")siteInfo
Just as with NWIS, the Water Quality Portal (WQP) offers a variety of ways to search for sites and request data. The possible Web service arguments for WQP site searches is found here.
To discover available sites in the WQP in New Jersey that have
measured Chloride, use the function
<- whatWQPsites( sitesNJ statecode = "US:34", characteristicName = "Chloride" )
To get data from the WQP using generalized Web service calls, use the
readWQPdata. For example, to get all the pH data
<- readWQPdata( dataPH statecode = "US:55", characteristicName = "pH" )
whatWQPdata returns a data frame with
information on the amount of data collected at a site. For example:
<- "Stream" type <- whatWQPdata(countycode = "US:55:025", siteType = type)sites
This returns a data frame with all of the sites that were measured in
streams in Dane County, WI. Also, in that table, there is a measure of
activityCount (how often the site was sampled), and
resultCount (how many individual results are
whatWQPsamples returns information on the
individual samples collected at a site. For example:
<- whatWQPsamples(siteid = "USGS-01594440")site
This returns one row for each instance that a sample was collect.
whatWQPmetrics provides metric information.
This is only currently available for STORET data:
<- "Stream" type <- whatWQPmetrics(countycode = "US:55:025", siteType = type)sites
All data frames returned from the Web services have some form of
associated metadata. This information is included as attributes to the
data frame. All data frames will have a
url (returning a
character of the url used to obtain the data),
(returning a data frame with information on sites), and
queryTime (returning a POSIXct datetime) attributes. For
example, the url and query time used to obtain the data can be found as
attr(dischargeWI, "url") attr(dischargeWI, "queryTime") <- attr(dischargeWI, "siteInfo")siteInfo
Depending on the format that the data was obtained (RDB, WaterML1, etc), there will be additional information embedded in the data frame as attributes. To discover the available attributes:
For data obtained from
readNWISgwl there are two
attributes that are particularly useful:
<- attr(dischargeWI, "siteInfo") siteInfo <- attr(dischargeWI, "variableInfo")variableInfo
Data obtained from
comment attribute is useful.
comment(peakData) # Which is equivalent to: attr(peakData, "comment")
This section describes the options for downloading and installing the
If you are new to R, you will need to first install the latest version of R, which can be found [here] (www.R-project.org).
At any time, you can get information about any function in R by typing a question mark before the functions name. This will open a file (in RStudio, in the Help window) that describes the function, the required arguments, and provides working examples. This will open a help file similar to the image below. To see the raw code for a particular code, type the name of the function, without parentheses.
Additionally, many R packages have vignette files attached (such as this paper). To view the vignette:
vignette(topic = "Introduction", package = "dataRetrieval")
This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Geological Survey nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.