An Intro To The Semantic Web: Why You Need To Know About It Sooner Than Later

You say you want a webolution?

Well you know, the web is evolving. From a web characterized by many readers and few authors, we are now in the midst of the read-write web. User-generated content abounds, and the rapid growth of social networking is the new norm.

And, just as we’ve begun to master this “social web”, Web 3.0 (the Semantic Web) is going to up-the-ante. Soon people, companies and billions of non-human entities (including our appliances, cars, and houses) will be generating meaningful data. Software “reasoners” will use connections between this data, and make inferences about it. Machines and people will work together more than ever before – now isn’t that progress?

It may be a tough pill to swallow. The semantic web is a very large and complex subject. Many believe it could even lead to a new form of artificial intelligence.

To help you get started and learn more about it, we have collaborated with Richard Howlett, the CTO of Information Mapping Canada, a firm that specializes in simplifying complex information and structured authoring.

It’s Webolution, Baby!

Webolution can be viewed in three stages: Web 1.0, Web 2.0 and Web 3.0. Each stage of the web builds upon (and melts into) the previous one, kind of like a layer cake (or perhaps more like a lasagna).

Mmmm… I love Web lasagna

Web 1.0 (the “Information Web”) was characterized by:

  • The development of HTTP and HTML by Sir Tim Berners-Lee in 1990
  • E-mail
  • Web directories and search engines (notably Yahoo and Google)
  • The rapid proliferation of websites
  • Many readers, but relatively few authors or content creators.

Web 2.0 (the “Social Web”) is characterized by:

  • Privately owned social networking websites such as Facebook (launched in 2004) and Twitter (launched in 2006)
  • Web services, which allow developers to combine (or “mash-up”) content from various websites (ex. Historypin: a site where users can upload photos from the past and pin them on Google Maps)
  • Cloud computing which virtualizes computing resources making them affordable and ubiquitous
  • Browser-based applications, which eliminate the need for managing desktop applications
  • Web syndication, which enables content to be subscribed to and reused by having it identified, collected and combined (“aggregated”) using RSS, and more recently the Atom publishing protocol.

Web 3.0 (the “Semantic Web”) is emerging, and is characterized by:

  • Linked data or hyperdata, where data objects are linked to other data objects (similar to how web pages are linked today)
  • Large hyperdata datasets such as DBpedia (a community effort to extract structured information from Wikipedia and make the information available on the Web)
  • A query language for hyperdata capable of treating the entire web as a single datastore, called SPARQL
  • The so-called “Internet of Things” where billions of non-human entities (including houses, cars and appliances) generate and publish their own hyperdata.

If semantics is the study of meaning, think of Web 3.0 as the Meaningful Web. Very broadly, things on the Internet will be described with descriptor languages so that computers can “understand” what they are. Computers will be able to make use of data residing inside web pages so when you’re searching for something, a person, a restaurant, a hotel, the machine goes into its vast network of meaningful linked data, creates connections for you, and suggests useful links that your human mind could never have come up with. At warp speed!

The Web lasagna ends up looking more like this:

Image Source:Frederic Martin

Why You Should Know About The Semantic Web

While the semantic web will affect web developers most, it will also make surfing Web 3.0 a richer and more relevant experience. The semantic web will help computers make better use of the data residing on the web. It will allow them to reason about the data by making inferences using controlled vocabularies (ontologies).

So what might this mean for you?

Benefiting From Your Own Data

You leave behind a trail of personal data while you conduct your daily life. Purchases you make, websites you visit, who your friends are – it’s all being recorded.

Your personal data:

  • provides value for the companies that know how to use it – for their ends
  • represents a privacy risk to you – since you often don’t control it.

The semantic web can change this. Author of the book Pull: The Power of the Semantic Web to Transform Your Business David Siegel , envisions a personal data locker” that will allow you to grant requesters selective access to your information.

The amount of information you provide will vary depending on how well you know and trust the requestor. You will only need to enter and update your personal information in a single place – in your personal data locker.

Pulling Your New Address

Say you move and have a change of address. No need to notify your friends and business associates, since they will request or “pull” it from your personal data locker when they need it. Why should they store and manage your data after all?

Using Your Personal Data To Be More Efficient

With control over your personal data, you can treat it like the asset it is. To make best use of your personal data history, you may use a personal software agent – software that can serve as an interface between your personal data locker and the rest of the world.

Using Your Software Agent To Order Takeout

For example, say you are busy, working late and are hungry. You may instruct your software agent to quickly help you find suitable take-out dinner.

Based on your purchasing history, your software agent might find that you often order pizza early in the week, but generally have seafood on Fridays. Since it is Friday, it might search the web to find local (even if you are away from home, it knows your current location) seafood restaurants.

It may then present you with a web page that lists candidate restaurants for you. The restaurants could be ranked from most-to-least popular, according to the purchasing histories of people within your social networks. Your agent has limited access to their personal data lockers as well after all…

How The Semantic Web Works

Image source:Lifeboat

Here, Mr. Howlett walks us through the more technical aspects of how the semantic web works.

Marking Up Web Content

To get to know about the semantic web you need to know how web content is marked-up.

To mark-up web content the following technologies are used:

This technology …. is used to …
Hypertext Markup Language (HTML) structure web content (i.e. hypertext documents) by denoting structural elements such as headings, tables, links and so on.
Cascading Style Sheets (CSS) add style to web content with visual characteristics such as font type and size, text color, margins and so on.
JavaScript add behavior to web content, ranging from functions that perform simple client-side validation all the way to browser-based applications.
Resource Description Framework (RDF) add meaning to web content, so that the data inside can be identified and reasoned about using computers.

Resource Description Framework

Resources on the semantic web are described using the Resource Description Framework (RDF). RDF is a W3C standard for describing web resources. It helps to ensure that the meaning of a web resource is interpreted as the author/publisher intended. So what exactly is a web resource?

Web Resources

A web resource is simply any identifiable information on the web. The resource itself is conceptual, while its representation is actual. When a web resource is requested, an appropriate representation of its current state is provided.

This approach to software architecture (known as REST) provides many benefits. For example, it helps to prevent broken links on the web, by eliminating the need to change a link every time a representation of the resource is changed.

Uniform Resource Identifiers (URIs) uniquely identify resources of any kind – not just information resources that exist on the web. When URIs are used on the web however, they are also known as Uniform Resource Locaters (URLs).

Example URL:

This web resource represents a collection of news items/resources.

Triples: Making A Statement Using RDF

Resources on the web are described using RDF statements. RDF statements are simple sentences that use the “active voice”. The verb or “predicate”, shows the subject of the sentence acting on an object. RDF statements may be visualized as a directed labeled graph, where the subject of the sentence points to its object. The labeled arrow is the predicate, as shown below.

Three sample RDF statements, with the third statement illustrated as a directed graph:

Subject (actor) Predicate (action) Object (receiver)
Google publishes news summaries.
The ball is round.
Jack knows Jill

Statements can be linked, and the object of one statement may be the subject of another.

Subject (actor) Predicate (action) Object (receiver)
Jack knows Jill
Jill fell-down the hill
Jack fell-down the hill

Linked Data

In order to link data, the subject, predicate and object of the statements are also URIs. To save typing and reduce errors, these URIs can be associated with user-defined prefixes.

This prefix… expands to this URI…

Linked data enables data to be shared in the same way HTML documents are shared today – using hyperlinks. However since the granularity of data is much finer than documents, linked data offers greater potential to be recombined, reasoned about and reused.

Say What You Mean By Controlling Your Vocabulary

While there may be millions of “Jacks” in the world, you can see by examining its URI person:Jack is referring to the person named “Jack” from “”. As well, you can see that the meaning of foaf:knows is defined in an XML dialect called “foaf”.

FOAF is an acronym that stands for Friend Of A Friend. FOAF is a controlled vocabulary (or “ontology”). Controlled vocabularies such as FOAF provide standardized, defined terms for expressing concepts (and their inter-relationships) within a subject domain.

This table shows some controlled vocabularies used to describe general information.

This vocabulary … describes information about ..
Friend Of A Friend (FOAF) people, including contact details, and basic relationships, such as other people that a person knows.
GoodRelations products, prices and company data.
Semantically-Interlinked Online Communities (SIOC) online communities.
Simple Knowledge Organization System (SKOS) the subject matter of content.
Upper Mapping and Binding Exchange Layer (Umbel) datasets and ontologies.

Learning More About The Semantic Web

To help you on your journey to learning more about the semantic web, here are some useful resources:

Resource Description
Linked Data (and the Web of Data) This 4-minute video from Ireland’s Digital Enterprise Research Institute (DERI) provides a great introduction to Linked Data.
Semantic Search Explained Also from DERI this 4-minute video explains what semantic search is and why you need it.
A Short Introduction to Semantic Web-based E-Commerce: The Good Relations Vocabulary A 15-minute sound slide presentation of Good Relations by its founder Martin Hepp. Anyone interested in buying or selling anything on the web should watch this!
Web 3.0 A 26-minute video that includes interviews with lumunaries Tim Berners-Lee and Clay Shirkey.
Introduction to the Semantic Web A 46-minute lecture by professor Jim Hendler, who co-authored with Tim Berners Lee and Ora Lassila the seminal Scientific American article from 2001, simply titled “The Semantic Web”.
Pull: The Power of the Semantic Web to Transform Your Business. David Siegel’s must-read book for anyone interested in understanding how the semantic web may help flip the economy from a push to a pull model.