Skip to main content

Dataset-oriented

The purpose of dataset-oriented JSON-LD output is to give enough information about the data available and the area, locations, or features that it is relevant to that a water data user would be able to quickly determine whether and how to download the data after reading.

Much is similar to the guidance for location-oriented web resources, so here we focus on the differences.

note

HydroShare automatically embeds JSON-LD. The JSON-LD examples below vary from HydroShare's default content to illustrate optional elements that would be useful for Geoconnex that are not currently implemented in HydroShare.

1. Identifiers, provenance, license, and distribution.

The first part of a dataset-oriented JSON-LD document is the metadata about the dataset.

For basic identifying and descriptive information, science-on-schema.org has appropriate guidance. In this case, note that a specific file download URL has been provided rather than an API endpoint, and that dc:conformsTo points to a data dictionary that is supplied at the same web resource.

{
"@context": {
"@vocab": "https://schema.org/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"keywords": ["water demand", "water supply", "geoconnex"],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": [
{
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": ["text/csv"],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},

2. Variables and Methods

The second section of the JSON-LD output for a dataset-oriented document specifies which variables are being measured in the dataset. In the example below, multiple variableMeasured are specified using a nested array. Other differences to point out:

  • The unit of "million gallons per day" is not available from the QUDT units vocabulary. It is in the ODM2 units codelist, so we populate unitCode with the url listed there.
  • The measurementMethod for both variables, which are simply different aggregation statistics for the same variable, do not have known web resources or specific identifiers available, and so use description to clarify the method.
  // ABBREVIATED FOR BREVITY
// ...,
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "water demand",
"description": "treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, accumlated over one month",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
{
"@type": "PropertyValue",
"name": "water demand (monthly average)",
"description": "average monthly treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, average accumlated over each month for multiple years",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
],
"temporalCoverage": "2002-01-01/2020-12-31",
"ssn-system:frequency": {
"value": "1",
"unitCode": "qudt-units:Month"
},

The third and final section is to specify the spatial coverage of the dataset.

note

Unlike the location-based example, where a location is explicitly the subjectOf the dataset, here, the dataset must be described as being about certain features. If the dataset is not explicitly about any discrete features, such as raster datasets, then a Spatial Coverage should be specified.

Using the about construction, a single geoconnex URI or an array of multiple can be constructed. In the below example, multiple are used. Note the nesting of nodes within the array so that each URI has an @id keyword and is @type Place. In this example, URIs from the geoconnex reference features set for Public Water Systems are used.

// ABBREVIATED FOR BREVITY
// ...,
"about": [
{
"@id": "https://geoconnex.us/ref/pws/NC0332010",
"@type": "Place"
},

{
"@id": "https://geoconnex.us/ref/pws/NC0368010",
"@type": "Place"
},

{
"@id": "https://geoconnex.us/ref/pws/NC0392010",
"@type": "Place"
},

{
"@id": "https://geoconnex.us/ref/pws/NC0392020",
"@type": "Place"
},

{
"@id": "https://geoconnex.us/ref/pws/NC0392045",
"@type": "Place"
}
],
// ...

To assist in finding reference features, https://reference.geoconnex.us allows queries following the OGC-API Features API standard and the CQL Common Query Language standard.

For example, to find the Geoconnex URI for the Raleigh public water system (PWS), we can construct the URL:

Sometimes it is impossible to use feature URIs because the relevant specific features are not available from https://reference.geoconnex.us/collections. If so, feel free to submit an issue to the geoconnex.us github repository requesting a reference feature set.

Sometimes it is impractical to list all applicable reference features, whether or not they are in https://reference.geoconnex.us or another source. This is common for comprehensive datasets that are all about an entire reference dataset or other another dataset like a hydrofabric, such as datasets summmarizing values to U.S. Counties, or the National Water Model generating values for all NHDPlusV2 COMID flowlines. In this case it is best to declare that the Dataset is isBasedOn the source geospatial fabric. For example, if the example dataset were about all public water systems instead of just the 5 listed, instead of about, we should specify an identifier, name, description, and any URLs for other resources that describe the source fabric and how to interpret it:

// ABBREVIATED FOR BREVITY
// ...,
"isBasedOn": {
"@id": "https://www.hydroshare.org/resource/9ebc0a0b43b843b9835830ffffdd971e/",
"name": "U.S. Community Water Systems Service Boundaries, v4.0.0"
"description": "This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US.",
"url": "https://github.com/SimpleLab-Inc/wsb"
},
// ...

Sometimes there are no particular features that a dataset is explicitly about. This is common with remote sensing raster data. In this case, it is best to specify a spatialCoverage polygon using WKT encoded geometry:

  "spatialCoverage": {
"@type": "Place",
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#MultiPolygon",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "MULTIPOLYGON (((-85.67957299999999 32.799514, -85.679637 32.822002999999995, -85.67199699999999 32.822063, -85.66421 32.821711, -85.647989 32.82224, -85.627966 32.822331, -85.627781 32.800716, -85.627496 32.778602, -85.635931 32.778656999999995, -85.645034 32.778146, -85.653352 32.778481, -85.67933699999999 32.778239, -85.67936399999999 32.784064, -85.679808 32.792068, -85.67957299999999 32.799514)))"
}
}
}

Full Example

Full Dataset-oriented JSON-LD output (Identifiers, Variables, and Spatial Coverage)

note

For more information regarding the underlying concepts, see the general JSON-LD reference page

{
// Identifiers, provenance, and context

"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"url": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"keywords": [
"water demand",
"water supply",
"geoconnex"
],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": {
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": [
"text/csv"
],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},

// Variables and Methods

"variableMeasured": [
{
"@type": "PropertyValue",
"name": "water demand",
"description": "treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "water meter",
"description": "metered bulk value, accumlated over one month",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
{
"@type": "PropertyValue",
"name": "water demand (monthly average)",
"description": "average monthly treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "water meter",
"description": "metered bulk value, average accumlated over each month for multiple years",
"url": "https://www.wikidata.org/wiki/Q268503"
}
}
],
"temporalCoverage": "2002-01-01/2020-12-31",
"dc:accrualPeriodicity": "freq:daily",
"dcat:temporalResolution": {"@value": "PT15M","@type":"xsd:duration"},

// Feature Links

"about": [
{
"@id": "https://geoconnex.us/ref/pws/NC0332010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0368010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392020",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392045",
"@type": "Place"
}
]
}