# 6.4. Data preparation#

In this section, we guide through how to create a dataset suitable to use with our open-source tool based on pygeoapi, which will generate landing pages with embedded JSON-LD.

To be most useful to the wider water data community, locations should have both descriptive and contextual information in the data published to geoconnex.us. Some useful descriptive information could include:

1. identifier

2. Location geometry (point or polygon latitude/longitude, preferably in WGS84)

3. short name

4. long name or description

5. organization

6. URLs where observed or modeled data about the location can be accessed This is of particularly great interest where available.

Contextual information could include:

1. administrative geographies it is within (e.g. census tract, municipality, county, state, PLSS section)

2. watershed boundary it is within (e.g. HUC12)

3. for groundwater sites, relevant aquifers

4. a relevant reference location. Many organizations publish data about the same feature, such as a common monitoring location that may serve as a streamgage, a water quality sampling site, as well as being fixed on a dam or bridge.

5. for surface water sites, the hydrologic address on the National Hydrography Dataset stream network

Wherever possible, contextual data should be in the form of persistent identifiers (PIDs) for these features. For example, counties are often given as a name, but spelling errors, capitalization or abbreviation differences, and other ambiguities can lead to barriers to interoperability between datasets that reference counties. In addition, these PIDs are already members of the knowledge graph, making adding your data to the knowledge graph simpler and more meaningful. Some sources for PIDs for these contextual features are provided at reference.geoconnex.us/collections . Some common patterns include:

### 6.4.1.3. Using NHDPlus identifiers to represent hydrologic addresses#

By using persistent identifiers for NHDPlus features, you can represent your locations’ spot on versions of NHDPlus in a way that eliminates ambiguity as to which version of the NHD the address pertains to, as well as reduce common errors such as failing to include leading 0’s in reachcodes.

## 6.4.2. Example:#

Below is an example table based on streamgages with data published at the California Data Exchange Center The table is also available for download as a csv here. Note the inclusion of descriptive information, links to various reference features, and the data_url linking to the CDEC data system entrypoint for each site.

Table 6.1 Example monitoring location tabular data for geoconnex#

uri

id

name

organization

data_url

latitude

longitude

reachcode_nhdpv2

measure_nhdpv2

mainstem_river

reference_gage

https://geoconnex.us/ca-gage-assessment/gages/AMC

AMC

California Department of Water Resources

http://cdec.water.ca.gov/dynamicapp/staMeta?station_id=AMC

38.645447

-121.347407

https://geoconnex.us/nhdpv2/reachcode/18020111000048

0

https://geoconnex.us/ref/mainstems/5147

https://geoconnex.us/ref/gages/1185578

https://geoconnex.us/ca-gage-assessment/gages/CSW

CSW

Kings River Below Crescent Weir

California Department of Water Resources

http://cdec.water.ca.gov/dynamicapp/staMeta?station_id=CSW

36.3863018

-119.875615

https://geoconnex.us/nhdpv2/reachcode/18030012009243

0

https://geoconnex.us/ref/mainstems/1796720

https://geoconnex.us/ref/gages/1185619