Automated retrievals
Obtaining USGS Water Data via Automated Methods
- USGS Water Data for the Nation Notification Service
- USGS Water Web Services and how it impacts you
- USGS definition of what is an active site
- Frequently Asked Questions
- Examples
USGS Water Data for the Nation Notification Service
USGS Water Data for the Nation is a highly available system. Like any system, it can experience downtime due to scheduled hardware and software upgrades, as well as unplanned network, equipment, and power failures. Enhancements to the system can also result in changes to output-file formats that can adversely affect automated retrievals. In addition, water data are collected at millions of sites around the country that are maintained by different USGS Water Science Centers. These science centers at times announce that data may be unavailable for certain periods. Although the USGS posts announcements on our public web site when significant issues arise, users performing automated retrievals often do not view the site with a browser, and therefore, do not receive these important messages.
If you depend on data collected here using automated systems, it is in your best interest to join the USGS Water Data Notifications Service. We will send emails to subscribers of the list with information on any significant planned outages, unexpected system problems as well as any changes to the system that might affect the automated retrieval community. This email list is for announcements only, so you cannot send mail to it.
We provide a simple web-based interface to subscribe to and unsubscribe from the Water Data Notifications Service. It is accessible from any web page on this site. Look for the link "Subscribe for system changes" near the bottom right corner of the page.
USGS Water Web Services and how it impacts you
The USGS has heard from many communities that its water data need to be more highly available and easier to acquire. Most data from the USGS Water Data for the Nation site are currently downloaded as tab-delimited (rdb) data files. While this approach works, it uses 20th century approaches rather than 21st century approaches. Extensible Markup Language (XML) and Javascript Object Notation (JSON) are the 21st century's most common means for sharing data.
In addition, the USGS is being requested to provide its water data in friendlier formats, such as native Microsoft Excel spreadsheets and Keyhole Markup Language (KML) to support integration with products such as Google Maps and Google Earth, and Geographic Information System (GIS) formats.
This transformation is underway but will take many years to complete. It involves creating multiple web services. USGS water web services are as highly available, if not more so, than the data services available on this site. In addition, they will strive to be faster, more flexible and allow more data formats. To the extent practical, the USGS is standardizing on WaterML as a common XML data format for its time series water data. WaterML is a standard by the Open Geospatial Consortium. Each USGS water web service will include other formats as well, including our legacy tab-delimited (RDB) format.
Over a period of years, you should expect a rich set of web services to replace tab-delimited downloads available on this site. When new production USGS water web services are announced, you are encouraged to convert any applications to use them instead. While existing data services on the waterdata.usgs.gov domain will continue for the present, it is possible that some years in the future you will need to use the equivalent water service instead.
The USGS has a water services web site with detailed information on these new services. Currently, six production services are available, including a popular instantaneous values web service that can be used to retrieve our current condition data, a daily values web service, a site service that returns information about USGS hydrologic sites, a groundwater levels service, water quality data service and a statistics service.
Through our notifications service, we will keep you informed of relevant new or enhanced web services, as well as warnings should any of the current data services become deprecated.
What is an Active Site
Sites may be active or inactive. A site is considered active if:
- it has collected time-series (automated) data within the last 183 days (6 months)
- it has collected discrete (manually collected) data within 397 days (13 months)
If it does not meet these criteria, it is considered inactive.
Some exceptions apply. For example, a site may also be shown as active if it is part of an ongoing occasional data-collection program. If a site is flagged by a USGS water science center as discontinued, it will show as inactive regardless of how recent data may be. A USGS science center can also flag a new site as active even if it has not collected any data. This control allows a user to select a broad category of sites to view and is useful for simplifying a view in areas with a high density of sites. The default selection is Active sites.
In waterservices.usgs.gov, the URL argument name is siteStatus. Example: &siteStatus=active
The site status indicates whether the site is active or inactive.
The NWIS Mapper uses the SiteService Water service to determine if a site is active or not.
Frequently Asked Questions regarding Automated Retrievals
- Can I use FTP to get USGS water data?
- Can I retrieve USGS water data in XML format on this site?
- What machine-readable data format does this site support?
- I need very specific data. How do I get just the data I need?
- What techniques are used to automate the retrieval of data?
- How do I get help refining my URL query?
- Is there a limit to the amount of data I can retrieve?
- Is there a good time of day to retrieve data?
- Are there best practices for writing programs to retrieve the data?
- If I serve USGS data, do I need to give credit to the USGS?
- How do I ensure I properly convey the correct meaning of the data?
- I need a large amount of data from this system. Can the USGS retrieve the data and send it to me?
- How do I get a copy of your Site file?
- How do I download water quality samples and results data?
- I can't seem to download your data anymore. I am getting HTTP 403 errors. What's going on?
- I am concerned that new releases of your system will cause my retrieval programs to break. What can I do to prevent this?
Can I use FTP to get USGS water data?
No.
Can I get USGS water data using a web service?
Yes. Instantaneous (current condition) data, daily values data, site information, groundwater levels, water quality data and a statistics service are now available via web services. See the USGS Water Services web site for full details. All services will be enhanced in the future. Some interim services may not offer the same depth of selection and data attributes available through this site.
Users are encouraged to begin using these services to acquire data where possible as these services are designed to be highly available and in most cases faster than downloading tab-delimited files using this site.
Can I retrieve USGS water data in XML format on this site?
The new Water Services site (waterservices.usgs.gov) allows daily values data to be downloaded in the XML format. Unfortunately, at this time no other data can be retrieved as XML within this site. Water quality data can also be downloaded in XML from a separate site.
What machine-readable data format does this site support?
The principal machine-readable data format supported by this site (waterdata.usgs.gov) is a variant of a tab-delimited ASCII file structure called rdb. General information regarding the rdb file structure can be found here.
I need very specific data. How do I get just the data I need?
All web queries of this site are done via the Hypertext Transfer Protocol (HTTP) GET method. This is significant because all information defining the query is contained in the various fields and arguments of the URL string. All data in this site can be retrieved with a URL by providing the correct URL-argument specifications. A number of examples are shown at the end of this document.
To begin the process, interactively navigate the waterdata.usgs.gov site's pages to obtain the tab-delimited data you are interested in and then note the URL syntax. Just remember to select tab-delimited output (or XML if it is available). Once you have the URL that precisely describes the data you want, it can be bookmarked, or run in an automated fashion using various tools that may be available in your programming language or operating system.
This site supports a large number of URL arguments, which allow requests to be fine-tuned and generalized, for example, by station number and specific data parameters of interest. It also supports desired time-periods of the data or the time period of last update. Note that any URL argument with a null value (ex. &variable=) can be ignored. Arguments with a non-null value (ex. &variable=flow) are essential.
What techniques are used to automate the retrieval of data?
Most users will prefer to use USGS Water Web Services to acquire USGS water data if an appropriate service exists there. However, you can also use this site to retrieve water data. The techniques are the same regardless of which site is hosting the data.
Automated retrievals are made by developing a program or application to submit the appropriate URLs and then parse the results in whatever way is appropriate for the intended use. On this site, some users have developed programs that read the HTML formatted by this site's pages rather than the tab-delimited (rdb) formatted data and scan for the data values amongst the HTML tags. This "screen scraping" approach is generally imprecise, "brittle", processing intensive, liable to break when the screen format is changed and thus is not recommended. Even downloaded data as tab-delimited fields can introduce problems when the data format changes. When possible we suggest downloading the appropriate data as XML from the USGS Water Services Web Site rather than this site.
There are numerous ways to automate the downloading of data. Most operating systems come with the ability to automatically perform a task at a regular time. If your computer runs either Linux or some variant of Unix, the cron utility will be of interest. Windows XP has a task scheduler. Windows 7 has a similar scheduling utility. You can use the appropriate utility to run your program.
If your operating system is Linux or some Unix variant, curl and wget are popular utilities for retrieving files over the Internet. Both are also available for Windows. If you are using Windows, it is possible to put the commands in a batch (.bat) file and call it from the Windows task scheduler. In addition, most modern programming languages support functions to retrieve files over HTTP. Check your programming language documentation for more detail.
Regardless of the means, please take care to write your queries carefully and to run the queries only when necessary.
How do I get help refining my URL query?
Please tell us what you want to do by sending an email to gs-w_waterdata_support@usgs.gov.
Is there a limit to the amount of data I can retrieve?
Yes, a single request will not return more than 100,000 site records, a limitation intended to prevent any one data consumer from unduly affecting other users of the system.
Is there a good time of day to retrieve data?
Yes. If possible, we prefer that you retrieve information during "off peak" hours. Midnight to 6 AM Eastern Time is ideal.
Are there best practices for writing programs to retrieve data?
Absolutely. At times (less frequently these days) we have had to shut down users performing automated retrievals from accessing this site in order to keep our system available. We do not like to do it, but the public depends on this system's availability. Here are some tips to help you get data efficiently and reduce the likelihood that you will impact other users of the system:
- First check the USGS Water Web Services Site. If a service there returns the data you need, please use that service following suggested tips. Otherwise, keep reading.
- Refine your query carefully, retrieving data only for the sites, USGS parameters, data filters and time-periods you need.
- Do not keep your retrieval program running in memory. When it is done getting its data, let it end and then call it again as needed. This site has many web servers all serving the same information. By using the hostname waterdata.usgs.gov in your requests you are guaranteed to be given the IP address of a functioning server and requests will be randomly distributed to all available servers. All servers carry the same current condition information. By ending your program when you have the data, you will not use the same server repeatedly. If a single high-volume user uses the same server continually, others accessing that server may experience delays.
- Do not submit your query more often than necessary and thereby redundantly retrieve the same data. While most current condition sites record data onsite every 15 minutes, they only transmit those data to the web database every 1 to 4 hours. Consequently, there is no point in retrieving the same data every five minutes. However, since there is no way for a user to know precisely when any particular site updates, this system provides a mechanism to, in effect ask, "If any of the sites in this list I am interested in have received new data in the previous N minutes, please return it to me now". This functionality is provided by choosing the "Update time" option in the current condition data site-selection criteria page. The relevant pair of URL arguments to specify this functionality are: &result_md=1&result_md_minutes=15. For details of how to use the functionality to greatly improve efficiency, see the example URLs section below. Also, note that daily value data are by definition only updated once per day for active stations and rarely if ever for inactive stations -- those with no data in the previous year.
- When in doubt, ask us. Send us an email to gs-w_waterdata_support@usgs.gov.
Thank you for your cooperation.
If I serve USGS data, do I need to give credit to the USGS?
In general, there is no requirement to attribute USGS data, since the data are in the public domain. However, the USGS strongly encourages those who serve data from this site to credit the USGS following these guidelines.
Please be aware that the USGS logo is a trademarked symbol. As such the logo can only be used if appropriate policies are followed. The USGS maintains a visual identity web site with more information.
The USGS provides valuable, timely, and scientifically reliable water data to the Nation. By crediting the USGS, and by linking your site to the USGS, you help spread the word about USGS water science data. We would appreciate it if you would take a few moments to let us know how you are using USGS water science data and the communities you are serving.
The link to the USGS web site is:
The proper link for this site is:
http://waterdata.usgs.gov/nwis/
How do I ensure I properly convey the correct meaning of the data?
Because water science can be complex, proper interpretation of the data on this system can be prone to error. A great deal of information is available in this help system. We encourage users to contact us if you have any questions on how to correctly interpret the data.
I need a large amount of data from this system. Can the USGS retrieve the data and send it to me?
Unfortunately, the USGS does not provide National data retrievals for the public. However, this system offers many ways to retrieve and download the data that you need. For your convenience, these data are available 24 hours a day, 7 days a week.
Historical approved streamflow data prior to October 1, 2007 is available for most states and sites at the USGS Instantaneous Data Archive. If you require USGS state or U.S. territory data prior to October 1, 2007 that is not available in this system, in many cases the local USGS Water Science Center can provide it for you, assuming it exists. An easy way to contact a local USGS Water Science Center is to navigate to any state page using this system. Select the data category and the geographic area from the upper right-hand corner and press “GO”. Once you are on the desired state page, select the “Questions about sites/data?” link at the bottom of the page. By filling out the form, your request will go directly to the local USGS Water Science Center.
How do I get a copy of your Site file?
There are nearly 1.5 million sites defined in the USGS National Water Information System. Unfortunately, the USGS does not maintain a single National file for download that contains all this information; however, it is possible through multiple queries of this system to assemble all the information in the Site file.
It is recommended that you use the USGS Site Web Service to acquire site data. Site data can also be acquired on this site.
Either way of acquiring data has the same problem: there is too much data for all site data to be retrieved in a single call in a multi-user environment. However, data can be acquired slowly over time through repeated calls to different geographical areas. Unfortunately, since there are more than a million sites, you cannot specify a box that covers a large area like the continental United States. One way of accomplishing this is to use the system to retrieve a list of stations by one degree of longitude at a time. On this site, it works with the limitation in the system that restricts any query to a maximum of 100,000 records.
See the example below as a method of acquiring data using this site.
How do I download water quality samples and results data?
Note that there is a web service that allows you to download water quality samples and results. We encourage you to use this service.
I can't seem to download your data anymore. I am getting HTTP 403 errors. What's going on?
This should only occur if for some reason the USGS has blocked your Internet Protocol (IP) address from using the service. This can happen if we judge that your use of the site is so excessive that it is seriously impacting others using the service. To get unblocked, send us the URL you are using along with the public IP of the machine to gs-w_waterdata_support@usgs.gov. We may require changes to your query and frequency of use in order to give you access to the service again and will work with you to find ways to optimize your queries.
I am concerned that new releases of your system will cause my retrieval programs to break. What can I do to prevent this?
You can become a pre-release tester. We typically release three times a year (Winter, Summer and Fall) and try to allocate a week for public testing in a staged environment. If you would like to be one of our pre-release testers, send us an email at gs-w_waterdata_support@usgs.gov and we'll add you to our list.
Examples
Some data are now available as web services. Examples are shown using web services if they exist, but also showing the legacy way.
When you implement your automated-retrieval script using the old methods, be sure to use host "waterdata.usgs.gov" to ensure the most reliable response. However, if interactively navigating this site to obtain the data you are interested in redirects you to the host "nwis.waterdata.usgs.gov", then use that hostname to access discrete water-quality data, peak streamflow data, and groundwater level data. Remember, any data on this site can be retrieved via automated methods.
Please refer to the codes help page.
Examples of URLs used for automated retrieval of current condition data
Retrievals by Period
New Way
You can now use the instantaneous values web service to retrieve current condition data. While you can return tab-delimited (rdb) data using the web service, or in a Javascript Object Notation (JSON) format. Three examples below show using the web service to retrieve the last 7 days of current condition data for all available parameters for site number in each format. There is a test tool available for the service that helps you understand the various outputs and filters, and lets you create a workable query.
Note: One important difference with the web service is that when specifying a period, the service return data for x days from now, whereas the old data service include all values thru midnight site local time x days from now.
Tab-delimited (rdb):
http://waterservices.usgs.gov/nwis/iv/?sites=08313000&period=P7D&format=rdb
JSON:
http://waterservices.usgs.gov/nwis/iv/?sites=08313000&period=P7D&format=json
Old Way
This simple URL retrieves the last 7 days of current condition data for all available parameters for site number, 08313000 in tab-delimited (rdb) format:
http://waterdata.usgs.gov/nwis/uv?format=rdb&period=7&site_no=08313000
Retrieving changed data only for a parameter
The following URLs represents the most efficient way to maintain a cache of all of today's current condition streamflow data for a list of sites on your local computer. The URL shown below will retrieve all of today's streamflow data (parameter code 00060) in tab-delimited (rdb) format for any of the five sites shown that have received updated data in the previous 30 minutes. If only one site has received updated data in the previous 30 minutes, only data for that one site will be returned.
New Way
Up to 100 sites can be specified in each request. With tab-delimited (rdb) data, if no data exists for a site, the headers will appear for the site, but no data will follow. Note that WaterML and JSON formats are also supported with the web service.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterservices.usgs.gov/nwis/iv/
?format=rdb
&sites=06006000,06012500,06016000,06017000,06018500
&period=P1D
&modifiedSince=PT30M
¶meterCd=00060
Old Way
If none of the five sites have received updated data in the previous 30 minutes the URL will return the string "No sites/data available for the selection criteria specified". Up to 20 sites can be specified in each request. The URL shown is intended to be reissued every 30 minutes. To return a different parameter code, modify the &index_pmcode_* argument appropriately. To retrieve data for all parameters for the sites, omit the &index_pmcode_* argument entirely.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/mt/nwis/uv
?multiple_site_no=06006000,06012500,06016000,06017000,06018500
&result_md=1&result_md_minutes=30
&index_pmcode_00060=1
&period=1
&format=rdb
The RDB output will match the example show for the new service.
Retrieving changed data only for all available parameters
Similarly, the following URLs represents the most efficient way to maintain a cache of only the most current condition data value for each available parameter for a list of sites on your local computer. As above, the URL will only retrieve data for any of the listed sites that have received update data in the previous 30 minutes. The URL is intended to be reissued every 30 minutes.
New Way
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterservices.usgs.gov/nwis/iv/
?format=rdb
&sites=06006000,06012500,06016000,06017000,06018500
&period=P1D
&modifiedSince=PT30M
Old Way
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/mt/nwis/current
?multiple_site_no=06006000,06012500,06016000,06017000,06018500
&result_md=1&result_md_minutes=30
&period=1
&format=rdb
Retrieving all current condition sites for a state
The same as above, but for all parameters at all current condition sites in New Mexico that have received updated data in the previous 30 minutes:
New Way
http://waterservices.usgs.gov/nwis/iv?format=rdb&stateCd=NM&modifiedSince=PT30M
Old Way
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/mn/nwis/current
?result_md=1
&result_md_minutes=30
&format=rdb
Examples of URLs used for automated retrieval of daily value data:
The examples shown will retrieve all daily value streamflow data (parameter code 00060) for site number 06090800 from 2005-01-01 through the present. To obtain the entire period-of-record use a start date of 1880-01-01 but be careful because you receive a lot of data.
From a Start Date
New Way
Use the daily values web service. Tab-delimited (rdb) output is also supported using format=rdb.
http://waterservices.usgs.gov/nwis/dv/?format=waterml,1.1&sites=06090800&startDT=2005-01-01
Note that a test tool is available that helps you create a query, as there are many possible outputs and filters with the service.
Old Way
Data are retrieved in a tab-delimited (rdb) format only. Note that an argument of &end_date=YYYY-MM-DD is also supported.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/nwis/dv
?site_no=06090800
&cb_00060=on
&begin_date=2005-01-01
&format=rdb
For a Period
New Way
To obtain the most recent 60 days of daily value data for a single site in a WaterML XML format, use the following syntax. Tab-delimited (rdb) format is also supported using format=rdb.
http://waterservices.usgs.gov/nwis/dv/?format=waterml,1.1&sites=06090800&period=P60D
Old Way
To obtain the most recent 60 days of daily value data for a single site in a tab-delimited (rdb) format, use the following syntax:
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/nwis/dv
?site_no=06090800
&cb_00060=on
&period=60
&format=rdb
Examples of URLs used for automated retrieval of site-description information:
New Way
A site water web service is now available but does not yet support an XML format. However, the tab-delimited (rdb) format is supported and is currently the default. Google Earth and Google Maps formats are supported as well using different format parameter values. The following URL will return basic site-description information for site 06090800 in a tab-delimited (rdb) format:
http://waterservices.usgs.gov/nwis/site/?format=rdb&sites=06090800
Note that a test tool is available that helps you create a query, as there are many possible outputs and filters with the service.
Old Way
This site also supports a site data service that return data in well-formed XML. The site data are high level but contains most attributes considered useful.
The following URL will return the specified site-description information for site 06090800 in XML format:
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/nwis/inventory
?search_site_no=06090800
&format=sitefile_output
&sitefile_output_format=xml
&column_name=agency_cd
&column_name=site_no
&column_name=station_nm
&column_name=dec_lat_va
&column_name=dec_long_va
&column_name=alt_va
The selected site-description information can vary in the output navigate the site's interface before setting up the programs that retrieve data to see what site-description field specifications are available. This URL will return the specified site-description information for sites that have daily value streamflow data in the state of New Mexico in tab-delimited (rdb) format.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterdata.usgs.gov/nm/nwis/inventory
?data_type=discharge
&format=sitefile_output
&sitefile_output_format=rdb
&column_name=agency_cd
&column_name=site_no
&column_name=dec_lat_va
&column_name=dec_long_va
&column_name=state_cd&column_name=alt_va
Examples of URLs used for automated retrieval of groundwater levels:
In the following example, all groundwater levels in Washington County, Rhode Island from January 1, 1980 through December 31, 1999 are retrieved.
New Way
Use our new groundwater levels web service. Note this service returns manually recorded groundwater levels only. If you are looking for groundwater levels recorded with automated equipment, try the instantaneous values web service.
In this example, data are returned in WaterML 1.1 (XML). Tab-delimited is also supported with &format=rdb.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://waterservices.usgs.gov/nwis/gwlevels/
?format=waterml
&countyCd=44009
&startDT=1980-01-01
&endDT=1999-12-31
Old Way
If you want to focus on a particular state, select that state from the geographic area drop down control. Groundwater measurements appear in this system as "Field Measurements" and can be selected by selecting the Groundwater button, then the Field Measurements button. If you are looking for groundwater levels recorded with automated equipment, choose the Current Conditions button on the Groundwater page instead of Field Measurements.
In this example, data are retrieved as tab-delimited (RDB) data because XML is not supported.
(URL shown in paragraph format for readability. This would normally appear all in one line.)
http://nwis.waterdata.usgs.gov/ri/nwis/gwlevels/
?county_cd=44009
&sort_key=site_no
&group_key=NONE
&sitefile_output_format=html_table
&column_name=agency_cd
&column_name=site_no
&column_name=station_nm
&begin_date=1980-01-01
&end_date=1999-12-31
&format=rdb
&date_format=YYYY-MM-DD
&rdb_compression=value
&list_of_search_criteria=county_cd