Data preparation
Contents
6.4. Data preparation#
In this section, we guide through how to create a dataset suitable to use with our open-source tool based on pygeoapi, which will generate landing pages with embedded JSON-LD.
6.4.1. Adding contextual information#
To be most useful to the wider water data community, locations should have both descriptive and contextual information in the data published to geoconnex.us. Some useful descriptive information could include:
identifier
Location geometry (point or polygon latitude/longitude, preferably in WGS84)
short name
long name or description
organization
URLs where observed or modeled data about the location can be accessed This is of particularly great interest where available.
Contextual information could include:
administrative geographies it is within (e.g. census tract, municipality, county, state, PLSS section)
watershed boundary it is within (e.g. HUC12)
for groundwater sites, relevant aquifers
a relevant reference location. Many organizations publish data about the same feature, such as a common monitoring location that may serve as a streamgage, a water quality sampling site, as well as being fixed on a dam or bridge.
for surface water sites, the hydrologic address on the National Hydrography Dataset stream network
6.4.1.1. Using reference cataloging features to add contextual information#
Wherever possible, contextual data should be in the form of persistent identifiers (PIDs) for these features. For example, counties are often given as a name, but spelling errors, capitalization or abbreviation differences, and other ambiguities can lead to barriers to interoperability between datasets that reference counties. In addition, these PIDs are already members of the knowledge graph, making adding your data to the knowledge graph simpler and more meaningful. Some sources for PIDs for these contextual features are provided at reference.geoconnex.us/collections . Some common patterns include:
states:
https://geoconnex.us/ref/states/{2-digit FIPS}
e.g. https://geoconnex.us/ref/states/48 for Texascounties:
https://geoconnex.us/ref/counties/{5-digit FIPS}
e.g. https://geoconnex.us/ref/counties/06037 for Los Angeles countyHUC12:
https://geoconnex.us/nhdplusv2/huc12/{12-digit HUC12 code}
e.g. https://geoconnex.us/nhdplusv2/huc12/030300020607 for the Morgan Creek HUC12HUC2-10:
https://geoconnex.us/ref/hu{02,04,06,08,10}/{2-10 - digit HUC2 - 10 code}
e.g. https://geoconnex.us/ref/hu08/06010105 for the Upper French Broad HUC8Mainstem River example: https://geoconnex.us/ref/mainstems/2104867 for the Hudson River
Secondary Hydrogeologic Regions example https://geoconnex.us/ref/sec_hydrg_reg/S50
6.4.1.2. Using reference locations to link to other data about the same location#
Since many organizations publish data about the same feature, it is useful for these organizations to link their relevant data to a common identifier for that feature. The geoconnex project currently maintains two sets of reference location identifiers:
Reference gages for all surface stream monitoring locations (whether streamgages in the traditional sense or any water sampling site). These take the form
https://geoconnex.us/ref/gages/{7-digit integer}
e.g. https://geoconnex.us/ref/gages/1000001Reference dams for all artificial dams impounding water bodies. These take the form
https://geoconnex.us/ref/dams/{7-digit-integer}
e.g. https://geoconnex.us/ref/dams/1000001
Note that these identifiers have somewhat arbitrary schemes that are maintained independently of the identifiers of common national “authoritative” datasets such as USGS Gages II or the USACE National Inventory of Dams in order to accomodate features that are not (yet) included in these datasets, and to handle persistence in the case where these systems sometimes change identifiers for a given real-world feature.
6.4.1.3. Using NHDPlus identifiers to represent hydrologic addresses#
By using persistent identifiers for NHDPlus features, you can represent your locations’ spot on versions of NHDPlus in a way that eliminates ambiguity as to which version of the NHD the address pertains to, as well as reduce common errors such as failing to include leading 0’s in reachcodes.
NHDPlusV2 comid example: https://geoconnex.us/nhdplusv2/comid/13293480
NHDPlusV2 reachcode example: https://geoconnex.us/nhdplusv2/reachcode/12040104000071
6.4.2. Example:#
Below is an example table based on streamgages with data published at the California Data Exchange Center The table is also available for download as a csv here. Note the inclusion of descriptive information, links to various reference features, and the data_url
linking to the CDEC data system entrypoint for each site.
uri |
id |
name |
organization |
data_url |
latitude |
longitude |
reachcode_nhdpv2 |
measure_nhdpv2 |
mainstem_river |
reference_gage |
---|---|---|---|---|---|---|---|---|---|---|
AMC |
Arcade Creek at Winding Way |
California Department of Water Resources |
38.645447 |
-121.347407 |
0 |
|||||
CSW |
Kings River Below Crescent Weir |
California Department of Water Resources |
36.3863018 |
-119.875615 |
0 |