On Friday, March 31, Chris and I, along with the delightful Jane Malliett from Winston-Salem State, went to the North Carolina Serials Conference in Chapel Hill. The conference was good, and there were several interesting sessions, but I’m going to focus on only one, because it was kind of complicated.
Jacob Shelby of NC State Libraries gave a presentation called “Exploring Linked Data Through the Lens of Technical Services” that did a very good job of giving a basic introduction to the complex topic of linked data. I will likely do a much inferior job of trying to explain his presentation, but here goes:
Shelby began by discussing the basic principles of linked data as described by Tim Berners-Lee (the guy who basically invented the World Wide Web). They are:
- Use URIs (Uniform Resource Indicators) as names of things
- Use HTTP URIs so that people can look up things
- When someone looks up a URI, provide useful information, using the standards used on the Web (such as RDF and SPARQL)
- Include links to other URIs, so people can discover more things
The key technologies in providing linked data are: Resource Description Framework (or RDF), serializations, ontologies, triplestores, and SPARQL.
RDF is the standard model for data interchange on the Web. It is very simple, and uses URIs to name the relationship between things. It does this through the use of triples, which follow the basic structure of a sentence. In a triple, you have a Subject, a Predicate, and an Object. An example of a triple would be: Person (subject) Has The Name (predicate) John Smith (object).
Serializations are computer languages or syntaxes that provide a way to group three URIs (one URI each for the subject, predicate and object) into a triple in an intelligible way. Examples of serializations are N-Triples, TURTLE, JSON-LD, and RDF/XML.
Ontologies are technologies that are used to describe all the relationships between different objects in RDF. An ontology formally represents knowledge as a set of concepts within a domain, and the relationship between those concepts. That is, an ontology provides the structural framework for organizing information. It’s about adding meaning to your data so it can be understood and reused by others. Examples of ontologies include RDF-schema, FOAF (Friend-of-a-Friend), Dublin Core, SKOS, Schema.org, and BIBFRAME.
A triplestore is a database that stores data that is saved as RDF triples.
SPARQL is a query language used for querying data that is organized into RDF.
So, to re-cap, if I’ve got this right (and I’m not entirely sure I do): RDF provides a model for describing things using URIs based on triples (subject, predicate, object). You use a serialization (syntax) to organize the URIs in an intelligible and usable way. You then use an ontology to create a framework to relate all of your serialized data in a way that is meaningful and can be understood and reused by others. All of this stuff is kept in a triplestore, and you can query and access the data using the SPARQL language. Linked data is about having a whole bunch of triplestores connected and sharing this rigorously structured data across the Web instantaneously.
Shelby concluded his presentation by saying that linked data helps:
- make data more meaningful.
- make data more interoperable.
- make data more connectable.
But that linked data does not solve all the world’s metadata problems.