You are here: Home / FAQ / About tab-delimited output

About tab-delimited output

Overview

The principal machine-readable data format supported by the USGS Water Data for the Nation site is a variant of a tab-delimited ASCII file structure called "rdb". The rdb file structure consists of a header section containing zero or more comment lines. The rdb header contains important information such as disclaimers, sites, parameter and location names. The header is followed by exactly one tab-delimited column-name row, which is followed by exactly one column-definition row, and a data section consisting of any number of rows of tab-delimited data fields. The header comment lines start with a sharp sign (#) followed by a space character followed by any text desired. The fields in the tab-delimited column-name row contain the names of each column. The fields in the tab-delimited column-definition row contain the data definitions and optional column documentation for each column. Data rows must have exactly the same number of tab-delimited columns as both the column-name and column-definition rows. Null data values are allowed. Example rdb file:

   # -------------------------------------------
   # Documentation lines. These describe and 
   # identify the rdb file contents. 
   # -------------------------------------------
   NAME   COUNT  TYP  AMT   OTHER   RIGHT
   6s     5n     3s   5n    8s      8s
   Bill   44     A    133   Another This 
   John   44          23    One     Is 
   Gary   77          77    Here    On 
   Mar    77     B    244   And     The 
   Greg   77     D    1111  So      Right
   

 

As a general rule, we discourage users from relying on data being in certain positions. In our Automated Retrieval FAQ, Automated retrievals, we suggest that when reading (parsing) NWISWeb rdb files, it is important to first parse the column-name row (the first non-comment row in the file) to determine the column position of each data value as different sites may return data columns in a different order. Information detailing the column-name syntax of NWISWeb data file is contained in the header comments of the file for each data type.

 

Water Data for the Nation output-file format

Tab-delimited data files output by the Water Data for the Nation site consist of tab-separated columns of data for one or more sites. Each site is separated by a header section of comments and new column definitions. The following 3 column definitions are always included for each site:

   Column        Definition
   ----------    -----------------------------------------
   agency_cd     Agency collecting data or maintaining the site 
   site_no       USGS site-identification number 
   datetime      Date (and time for real-time data) in ISO format

The remaining pairs of columns vary for each site depending on whether real-time or daily data are being output and on which data parameters were selected.

The alphanumeric groupings define the width of the column and the type of data that it contains.  For example, 5s indicates that the column is 5 characters wide and contains data of the type "string".  Letter designations are as follows:

d = date
m = month
n = numeric
s = string

Further information regarding the USGS use of the rdb format can be found here.

For current condition data the data-column pairs use the format   'nn_nnnnn'   'nn_nnnnn_cd'   where the first two-number sequence in each column name uniquely defines the sensor (the 'data descriptor') used to collect the data and the following five-number sequence defines the 'parameter_cd' which describes the type of data shown in the column. The second   'nn_nnnnn_cd'   column in the pair contains data-value qualification codes pertaining specifically to the preceeding column.

For daily data the data-column pairs use a similar format of   'nn_nnnnn_nnnnn'   'nn_nnnnn_nnnnn_cd'   where the first two-number sequence in each column name uniquely defines the sensor (the 'data descriptor') used to collect the data, the next five-number sequence defines the 'parameter_cd' which describes the type of data shown in the column, and the next five-number sequence describes the type of daily statistic used to calculate the daily data value. The second   'nn_nnnnn_nnnnn_cd'   column of the pair contains data-value qualification codes pertaining specifically to the preceeding column.

A list of specific parameter codes, statistic codes (if daily data), and data-value qualification codes (if present) are included in the header for each site. For example:

   
   # Data for the following station(s) are contained in this file
   # -------------------------------------------------------------
   #  USGS 06041000 Madison River bl Ennis Lake nr McAllister MT
   #
   # Available data at this site--lines with asterisk '*' are included in this output.
   #    DD parameter statistic - Description
   #    --   -----     -----     ------------------------------------
   #   *02   00065     00003   - Gage height, feet (Mean)
   #   *05   00010     00001   - Temperature, water, degrees Celsius (Maximum)
   #   *05   00010     00002   - Temperature, water, degrees Celsius (Minimum)
   #   *05   00010     00003   - Temperature, water, degrees Celsius (Mean)
   #   *06   00060     00003   - Discharge, cubic feet per second (Mean)
   #
   # Data-value qualification codes included in this output: 
   #     A  Approved for publication -- Processing and review completed.  
   #     P  Provisional data subject to revision.  
   #     e  Value has been estimated.