Design decision 8 - Dual implementation of structured data with JSON-LD and RDFa Lite in a single web page
Summary
- Expert: @duboisp
- Status: Approved
- Type: Best Practice
- Last updated: 2019-10-23
- History:
- (2019-10-23) Published
- (2019-10-23) Approved at the WET roadmap meeting
- (2019-09-22) Clarified the practice solution and added more examples
- (2019-09-20) Changed
#json-ld
for#wb-script
to references the alternative strucutre data version - (2019-09-20) Presented at the roadmap meeting
- (2019-06-27) Initial draft
Issue
Provide guidance on how to include JSON-LD structured data along with other structured marked up with RDFa or RDFa Lite in a single web page.
Background
Structured data has been implemented in the wet-boew theme since the first version of WET 4.0. Schema.org is the preferred vocabulary and RDFa Lite is the preferred notation language to markup Web documents. A minimal and generic implementation was included in the wet-boew template and gcweb template to identify the WebPage thing and it’s properties such breadcrumb, mainContentOfPage, name, and the date modified. Recently GCweb theme was enhanced to include structured data to specify the publisher and to explicitly define the typeof of mainContentOfPage.
Nowadays, content provider has expressed the need to add some structured data, without being limited, to support richer search result and to provide meaningful information for voice assistant.
Some major search engine, such Google, recommend to use JSON-LD to associate the structured data over other notation such as RDFa or microdata which is also supported.
It is important to notes that is possible and valid according to W3C specification to create structure data by using various notation such JSON-LD, RDFa, Turtle, Microdata… which can be then translated into a Resource Description Framework (RDF) statements or RDF graph. Also, there is plenty ways to create linking between the data. Establishing the data relationship is defined by the notation and by the vocabulary used.
Challenges
Combining multiple notation increase the risk of having some data integrity issue. The structured information should not contradict itself and should not contradict with other meta data that use an alternative representation outside the scope of the structured data.
There is some structure data which is easier to codify by using RDFa Lite and other structure data which is easier to codify by using JSON-LD. Both notation has their advantage.
Interlinking the JSON-LD structured data with the RDFa structured data. Both notation has a way to create a in-page reference and they can be interlinked by using some schema.org property.
Delimit the structured data that belong to the template container versus the structured data for the page content.
Other structure data notation
An author should be able to use a structure data notation that best fit the context of the published content. For example:
- Javascript, JSON file -> JSON-LD
- HTML files -> Microdata, RDFa lite, RDFa, JSON-LD, Turtle
- XML files -> RDFa lite, RDFa
- Annotated tables, CSV files -> JSON metadata file, JSON metadata script
- Structured data in plain text -> Turtle
Guidance on structure data
- Provide structured information that are conforming to the specification and vocabulary used.
- Provide complementary information and avoid contradiction information.
- Provide up-to-date information.
- Provide original content that you or your users have generated.
- Don’t mark up content that is not visible to readers of the page.
- Don’t mark up irrelevant or misleading content, such as fake reviews or content unrelated to the focus of a page.
- Don’t use structured data to deceive or mislead users.
- Don’t create broken reference.
- Don’t duplicate the structured content, try to use link instead.
Solution
Usually, an HTML file for a given URL will identify one thing. The author will either use JSON-LD or RDFa Lite or Microdata to provide structured data information. These day the notation used often depend on the targeted services and on the technical limitation of the author when he publish web content.
This solution create three (3) distinct resource instead of one (1).
- Web resource (the whole page including the site template) - Identify by the web resource URL
- Main content of the page - Identify by the id
#wb-main
- Alternate or complementary structured data information that support the main content of the page - Identifiy by the id
#wb-script
WET-BOEW template use RDFa Lite as the structure data notation for web resource and for the main content of the page.
This techniques use resource ids in order to avoid confusion between resource that use different structure data notation. Creating a resource id depend of the structure data notation. For example in RDFa the id is specified via the property resource
, in JSON-LD the id is specified via the property @id
, in Turle it is the IRI of the resource itself.
Then each resource could use those ids for cross-referencing. Cross-referencing can be done via some term defined in a vocabulary or via some strucutured data notation. Like the term “sameAs” in schema.org vocabulary or the term “sameAs” in the Ontology Web Language vocabulary or via the property “about” in RDFa or by providing an IRI value in JSON-LD.
Technical guidance on implementation
- This apply for HTML pages that contains structured data written with two or more notation.
- Each structured data must have a distinctive and unique resource identifier.
- Cross-referencing is optional, but a soft cross-reference from the alternate/complementary version to the main content of the page is recommended.
- Data structures must not contradict each other. Duplicating information increase the risk of creating data integrity issue and could provide unexpected result.
- Only add the alternative or complementary structured data information in your page if needed.
Example
This example assume your web page served at the URL “http://example.com/page.html”. It show a minimal implementation that use the structure data notation RDFa lite and JSON-LD along with the schema.org vocabulary. The cross-reference is done one way from the complementary/alternate structured data information by using the term “sameAs” of schema.org vocabulary.
<!DOCTYPE html>
<html class="no-js" lang="en" dir="ltr">
<head>
…
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "#wb-script",
"sameAs": "#wb-main"
}
</script>
…
</head>
<body vocab="http://schema.org/" typeof="WebPage">
…
<main property="mainContentOfPage" typeof="WebPageElement" resource="#wb-main" class="container">
…
</main>
…
</body>
</html>
Will produce the following resource:
- http://example.com/page.html - The WebPage schema.org thing notated in RDFa lite
- http://example.com/page.html#wb-script - The complementary resource notated in JSON-LD
- http://example.com/page.html#wb-main - The HTML main element resource notated in RDFa lite
View the same example but in Turtle notation
<!DOCTYPE html>
<html class="no-js" lang="en" dir="ltr">
<head>
…
<script type="text/turtle">
@prefix schema: <http://schema.org> .
<#wb-script>
schema:sameAs <#wb-main> .
</script>
…
</head>
<body vocab="http://schema.org/" typeof="WebPage">
…
<main property="mainContentOfPage" typeof="WebPageElement" resource="#wb-main" class="container">
…
</main>
…
</body>
</html>
Will produce the following resource:
- http://example.com/page.html - The WebPage schema.org thing notated in RDFa lite
- http://example.com/page.html#wb-script - The complementary resource notated in Turtle
- http://example.com/page.html#wb-main - The HTML main element resource notated in RDFa lite
Cross-referencing
RDFa lite or RDFa ( Schema.org vocabulary )
<main property="mainContentOfPage" typeof="WebPageElement" resource="#wb-main" class="container">
<meta property="sameAs" content="#wb-script" />
…
</main>
JSON-LD ( Schema.org vocabulary )
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "#wb-script",
"sameAs": "#wb-main"
}
</script>
RDFa lite or RDFa ( OWL vocabulary )
<main property="mainContentOfPage" typeof="WebPageElement" resource="#wb-main" class="container">
<meta property="owl:sameAs" content="#wb-script" />
…
</main>
JSON-LD ( OWL vocabulary )
<script type="application/ld+json">
{
"@context": {
"schema": "http://schema.org",
"owl": "http://www.w3.org/2002/07/owl#"
}
"@id": "#wb-script",
"owl:sameAs": "#wb-main"
}
</scrip>
RDFa ( the property “about” )
<main property="mainContentOfPage" typeof="WebPageElement" resource="#wb-main" about="#wb-script" class="container">
…
</main>
JSON-LD ( IRI reference with Schema.org vocabulary )
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "#wb-script",
"sameAs": {
"@id": "#wb-main"
}
}
</script>
(Informative) JSON-LD script alternative equivalent notation
The following code sample produce each the same result. Please refer to the JSON-LD specification for more details.
<script type="application/ld+json">
{
"@id": "#wb-script",
"http://schema.org/sameAs": "#wb-main"
}
</script>
<script type="application/ld+json">
{
"@id": "#wb-script",
"schema:sameAs": "#wb-main"
}
</script>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "#wb-script",
"sameAs": "http://example.com/page.html#wb-main"
}
</script>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "http://example.com/page.html#wb-script",
"sameAs": "#wb-main"
}
</script>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@id": "http://example.com/page.html#wb-script",
"sameAs": "http://example.com/page.html#wb-main"
}
</script>
<script type="application/ld+json">
{
"@id": "http://example.com/page.html#wb-script",
"http://schema.org/sameAs": "http://example.com/page.html#wb-main"
}
</script>
<script type="application/ld+json">
{
"@id": "http://example.com/page.html#wb-script",
"schema:sameAs": "#wb-main"
}
</script>
<script type="application/ld+json">
{
"@context": {
"schema": "http://schema.org",
"foaf": "http://xmlns.com/foaf/0.1/"
}
"@id": "http://example.com/page.html#wb-script",
"schema:sameAs": "#wb-main"
}
</script>