Finding datasets related to hydrologic features
The most important use case of the Geoconnex system for data users is to discover datasets related to a given hydrologic feature. This functionality is the reason for implementing a Knowledge Graph approach. There are two ways we currently offer users to discover datasets:
- Convenience references to "datasets" within Reference Feature Collections (currently implemented only for
mainstems
) - Using the full SPARQL query API for the graph database at https://graph.geoconnex.us
Convenience "dataset" reference in API
Each mainstem's GeoJSON response includes a nested array within the "datasets" attribute which can be extracted. In the example below, 58 datasets are available about the Animas River. These can be filtered further to get datasets on specific variables like temperature or turbidity.
- R
- Python
# Animas River mainstem URI
animas_uri <- "https://geoconnex.us/ref/mainstems/35394"
# Fetch the Animas River data
response <- GET(animas_uri, query = list(f = "json"))
animas_data <- content(response, "text") |> fromJSON()
# Extract datasets
datasets <- animas_data$properties$datasets
datatable(datasets,
options = list(pageLength = 5, scrollX = TRUE,
search = list(regex=TRUE, caseInsensitive = TRUE, search = 'temperature')
)
)
# Animas River mainstem URI
animas_uri = "https://geoconnex.us/ref/mainstems/35394"
# Fetch the Animas River data
response = requests.get(animas_uri, params={'f': 'json'})
animas_data = response.json()
# Extract datasets
datasets = animas_data['properties']['datasets']
# Create a DataFrame
df_datasets = pd.DataFrame(datasets)
# Display the DataFrame with options similar to datatable in R
df_datasets.head(5) # Display the first 5 rows
| monitoringLocation | siteName | datasetDescription | type | url | variableMeasured | variableUnit | measurementTechnique | temporalCoverage | distributionName | distributionURL | distributionFormat | wkt |
|----------------------------------------------------------|---------------|------------------------------------------------------------|--------|------------------------------------------------------------|-------------------------------------------------|--------------|----------------------|--------------------------------------------------|--------------------------------------|-----------------------------------------------------------|-------------------------------------|---------------------------------------------------|
| https://sta.geoconnex.dev/collections/USGS/Thi... | USGS-09363500 | [Turbidity, FNU / USGS-09363500-1f7260408f1d4c... | Stream | https://waterdata.usgs.gov/monitoring-location... | Turbidity | _FNU | observation | 2024-08-30T09:15:00Z/2024-09-09T18:00:00Z | USGS Instantaneous Values Service | https://waterservices.usgs.gov/nwis/iv/?sites=... | text/tab-separated-values | POINT (-107.87533333333333 37.036569444444446) |
| https://sta.geoconnex.dev/collections/USGS/Thi... | USGS-09363500 | [Turbidity, FNU / USGS-09363500-1f7260408f1d4c... | Stream | https://waterdata.usgs.gov/monitoring-location... | Turbidity | _FNU | observation | 2024-08-30T09:15:00Z/2024-09-09T18:00:00Z | USGS SensorThings API | https://labs.waterdata.usgs.gov/sta/v1.1/Datas... | application/json | POINT (-107.87533333333333 37.036569444444446) |
| https://sta.geoconnex.dev/collections/USGS/Thi... | USGS-09363500 | Specific cond at 25C / USGS-09363500-252c7ef5e... | Stream | https://waterdata.usgs.gov/monitoring-location... | Specific cond at 25C / USGS-09363500-252c7ef5e... | uS/cm | observation | 2024-08-30T09:15:00Z/2024-09-09T18:00:00Z | USGS Instantaneous Values Service | https://waterservices.usgs.gov/nwis/iv/?sites=... | text/tab-separated-values | POINT (-107.87533333333333 37.036569444444446) |
| https://sta.geoconnex.dev/collections/USGS/Thi... | USGS-09363500 | Specific cond at 25C / USGS-09363500-252c7ef5e... | Stream | https://waterdata.usgs.gov/monitoring-location... | Specific cond at 25C / USGS-09363500-252c7ef5e... | uS/cm | observation | 2024-08-30T09:15:00Z/2024-09-09T18:00:00Z | USGS SensorThings API | https://labs.waterdata.usgs.gov/sta/v1.1/Datas... | application/json | POINT (-107.87533333333333 37.036569444444446) |
| https://sta.geoconnex.dev/collections/USGS/Thi... | USGS-09363500 | None / USGS-09363500-3177e767c0af430b9efae7195... | Stream | https://waterdata.usgs.gov/monitoring-location... | None / USGS-09363500-3177e767c0af430b9efae7195... | in | observation | 2024-08-30T09:15:00Z/2024-09-09T18:00:00Z | USGS Instantaneous Values Service | https://waterservices.usgs.gov/nwis/iv/?sites=... | text/tab-separated-values | POINT (-107.87533333333333 37.036569444444446) |
SPARQL
For those who wish to make SPARQL queries directly, the endpoint is https://graph.geoconnex.us.
For example, one can query the graph for all datasets schema:about
monitoring locations that are on (<hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement>
) the Animas River, that have a value for the schema:variableMeasured
that includes the string "temperature".
- R
- Python
- Shell
- Graphically
# SPARQL endpoint
endpoint <- "https://graph.geoconnex.us"
# Revised SPARQL query
query <- 'PREFIX schema: <https://schema.org/>
PREFIX gsp: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
SELECT DISTINCT ?monitoringLocation ?siteName ?datasetDescription ?type ?url
?variableMeasured ?variableUnit ?measurementTechnique ?temporalCoverage
?distributionName ?distributionURL ?distributionFormat ?wkt
WHERE {
VALUES ?mainstem { <https://geoconnex.us/ref/mainstems/35394> }
?monitoringLocation hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement ?mainstem ;
schema:subjectOf ?item ;
hyf:HydroLocationType ?type ;
gsp:hasGeometry/gsp:asWKT ?wkt .
?item schema:name ?siteName ;
schema:temporalCoverage ?temporalCoverage ;
schema:url ?url ;
schema:variableMeasured ?variableMeasured .
?variableMeasured schema:description ?datasetDescription ;
schema:name ?variableMeasuredName ;
schema:unitText ?variableUnit ;
schema:measurementTechnique ?measurementTechnique .
OPTIONAL {
?item schema:distribution ?distribution .
?distribution schema:name ?distributionName ;
schema:contentUrl ?distributionURL ;
schema:encodingFormat ?distributionFormat .
}
# Filter datasets by the desired variable description
FILTER(REGEX(?datasetDescription, "temperature", "i"))
}
ORDER BY ?siteName
'
# Execute the SPARQL query
response <- GET(
url = endpoint,
query = list(query = query),
accept("application/sparql-results+json")
)
# Parse the JSON response
result <- content(response, "text", encoding = "UTF-8") %>% fromJSON()
# Extract the results
datasets <- as.data.frame(result$results$bindings) |>
mutate(across(everything(), ~.$value))
datatable(datasets)
import requests
import json
import pandas as pd
# SPARQL endpoint
endpoint = "https://graph.geoconnex.us"
# Revised SPARQL query
query = """
PREFIX schema: <https://schema.org/>
PREFIX gsp: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
SELECT DISTINCT ?monitoringLocation ?siteName ?datasetDescription ?type ?url
?variableMeasured ?variableUnit ?measurementTechnique ?temporalCoverage
?distributionName ?distributionURL ?distributionFormat ?wkt
WHERE {
VALUES ?mainstem { <https://geoconnex.us/ref/mainstems/35394> }
?monitoringLocation hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement ?mainstem ;
schema:subjectOf ?item ;
hyf:HydroLocationType ?type ;
gsp:hasGeometry/gsp:asWKT ?wkt .
?item schema:name ?siteName ;
schema:temporalCoverage ?temporalCoverage ;
schema:url ?url ;
schema:variableMeasured ?variableMeasured .
?variableMeasured schema:description ?datasetDescription ;
schema:name ?variableMeasuredName ;
schema:unitText ?variableUnit ;
schema:measurementTechnique ?measurementTechnique .
OPTIONAL {
?item schema:distribution ?distribution .
?distribution schema:name ?distributionName ;
schema:contentUrl ?distributionURL ;
schema:encodingFormat ?distributionFormat .
}
# Filter datasets by the desired variable description
FILTER(REGEX(?datasetDescription, "temperature", "i"))
}
ORDER BY ?siteName
"""
# Execute the SPARQL query
response = requests.get(
endpoint,
params={'query': query},
headers={'Accept': 'application/sparql-results+json'}
)
# Parse the JSON response
result = response.json()
# Extract the results
bindings = result['results']['bindings']
datasets = pd.json_normalize(bindings)
datasets
#!/bin/bash
# Define the SPARQL query
read -r -d '' SPARQL_QUERY << EOM
PREFIX schema: <https://schema.org/>
PREFIX gsp: <http://www.opengis.net/ont/geosparql#>
PREFIX hyf: <https://www.opengis.net/def/schema/hy_features/hyf/>
SELECT DISTINCT ?monitoringLocation ?siteName ?datasetDescription ?type ?url
?variableMeasured ?variableUnit ?measurementTechnique ?temporalCoverage
?distributionName ?distributionURL ?distributionFormat ?wkt
WHERE {
VALUES ?mainstem { <https://geoconnex.us/ref/mainstems/35394> }
?monitoringLocation hyf:referencedPosition/hyf:HY_IndirectPosition/hyf:linearElement ?mainstem ;
schema:subjectOf ?item ;
hyf:HydroLocationType ?type ;
gsp:hasGeometry/gsp:asWKT ?wkt .
?item schema:name ?siteName ;
schema:temporalCoverage ?temporalCoverage ;
schema:url ?url ;
schema:variableMeasured ?variableMeasured .
?variableMeasured schema:description ?datasetDescription ;
schema:name ?variableMeasuredName ;
schema:unitText ?variableUnit ;
schema:measurementTechnique ?measurementTechnique .
OPTIONAL {
?item schema:distribution ?distribution .
?distribution schema:name ?distributionName ;
schema:contentUrl ?distributionURL ;
schema:encodingFormat ?distributionFormat .
}
FILTER(REGEX(?datasetDescription, "temperature", "i"))
}
ORDER BY ?siteName
EOM
# URL encode the query so we can send it as a URL parameter
ENCODED_QUERY=$(python3 -c "import urllib.parse; print(urllib.parse.quote('''$SPARQL_QUERY'''))")
# Define the endpoint URL
ENDPOINT="https://graph.geoconnex.us?query=$ENCODED_QUERY"
# Send the query as a GET request by using URL parameters and encoding the query
curl -X GET --header 'Accept: application/sparql-results+json' "$ENDPOINT"
# Send the query as a POST request by using the query in the body
curl -X POST --header 'Accept: application/sparql-results+json' --header 'Content-Type: application/sparql-query' --data "$SPARQL_QUERY" "$ENDPOINT"
Since graph.geoconnex.us is a public SPARQL endpoint, it can be used with a variety of SPARQL editors. One such example is included in the playground section of the docs, titled SPARQL Query Helper.