Follow posts tagged #rdfa in seconds.

Sign up

Semantic Web Gets a Boost

technologyreview.com

Via Technology Review:

Google, Microsoft, and Yahoo have teamed up to encourage Web page operators to make the meaning of their pages understandable to search engines.

The move may finally encourage widespread use of technology that makes online information as comprehensible to computers as it is to humans. If the effort works, the result will be not only better search results, but also a wave of other intelligent apps and services able to understand online information almost as well as we do.

The three big Web companies launched the initiative, known as Schema.org, last week. It defines an interconnected vocabulary of terms that can be added to the HTML markup of a Web page to communicate the meaning of concepts on the page. A location referred to in text could be defined as a courthouse, which Schema.org understands as being a specific type of government building. People and events can also be defined, as can attributes like distance, mass, or duration. This data will allow search engines to better understand how useful a page may be for a given search query—for example, by making it clear that a page is about the headquarters of the U.S. Department of Defense, not five-sided regular shapes.

The article goes on to note that Schema.org standards support microformats microdata* rather than RDFa which is supported and promoted by the international Web standards body W3C.

Still, if it can gain traction, it’s a big step forward for machine understanding of all this content we’re throwing at the Web which, in turn, means a whole new class of applications using such data might be in our near future.

*Hat tip to Aaron Bradley (@aaranged) on Twitter for pointing out that it’s microdata, not microformats, that Google, Microsoft and Yahoo are supporting.

Dati di Vienna e modulo Drupal


Il comune di Vienna oggi informa - http://data.wien.gv.at/apps/drupal-modul.html – di una iniziativa da parte dalla comunità austriaca degli sviluppatori Drupal – http://www.drupal-austria.at/

Si tratta di un modulo Drupal sui dati aperti di Vienna.

Il progetto nasce durante il secondo DrupalCamp tenutosi a Vienna permette di rappresentare, su una mappa (che fa uso di OpenStreetMap come background) alcuni geodati della città (stazioni del prestito biciclette, ospedali e università).

A prima vista può sembrare qualcosa di semplice, ma, andando nei dettagli si scopre che ogni dato viene fornito con una definizione della tipologia di contenuto.
Tipologia mappata sui vocabolari di Schema.org – http://www.schema.org
Operazione permette quindi di distribuire i dati in formato RDFa (quindi Linked Open Data) con tanto di servizio endpoint SPARQLhttp://austria.drupaldata.com/sparql

La demo del modulo Drupal è presente al sito http://austria.drupaldata.com da cui è possibile accedere anche al catalogo dei dati in RDFa – http://austria.drupaldata.com/vienna/datasources
Oltre ai dati di Vienna sono disponibili anche quelli di Linz (altra città austriaca che ha aperto i dati – http://data.linz.gv.at/daten).

Il modulo – anche se ancora in via di sviluppo – è disponibile a questo indirizzo http://drupal.org/project/odv

Done playing with HTML5 microdata RDFa xmlns schema opengraphprotocol … will continue later this week. I want to try something else…

Using VIE for server-side templating

In our Palsu collaborative meeting tool we’re using VIE for server-side page generation. This effectively means RDFa is our templating language. The CoffeeScript looks like the following:

# Serve the list of meetings
server.get '/dashboard', (request, response) ->
    # Read our HTML template file
    return fs.readFile "#{process.cwd()}/templates/index.html", "utf-8", (err, data) ->
        
        # Prepare a JSDOM window for the template
        document = jsdom.jsdom data
        window = document.createWindow()
        jQ = jQuery.create window

        # Find RDFa entities and load them
        VIE.RDFaEntities.getInstances jQ "*"
        # Get the Calendar object
        calendar = VIE.EntityManager.getBySubject 'urn:uuid:e1191010-5bb1-11e0-80e3-0800200c9a66'

        # Query for events that have the calendar as component
        events = calendar.get 'rdfcal:has_component'
        events.predicate = "rdfcal:component"
        events.object = calendar.id
        return events.fetch
            success: (eventCollection) ->
                VIE.cleanup()
                return response.send window.document.innerHTML
            error: (collection, error) ->
                VIE.cleanup()
                return response.send window.document.innerHTML

While this isn’t the most elegant example of page generation with Express, the obvious benefit of RDFa as templating language is there: don’t repeat yourself. The same templating mark-up serves as templating on the server-side, client-side and for SEO and integration purposes.

The Semantic War, er, Web

Ever since usage of the web exploded, as a community we’ve wondered: ‘what next’? What’s the next evolution of this technology that has rocked our world? Sir Tim Berners-Lee presented a vision of the web’s future back in 2000, entitled the Semantic Web.

HTML enabled mere humans to describe information to a computer, for consumption by other humans. By embedding some structured information, we can clue computers into that information, too. That turns out to be a pretty powerful idea.

At the time, there were two core pieces that needed to develop for this Semantic Web business to get off the ground: a markup method, and a common vocabulary.

Really smart people labored for years on the markup. The academic camp focused on a technology called RDF. Sadly, it seemed overcomplicated, as if the most convoluted parts of XML were made even-less people-friendly. Instead of the comparatively friendly world of human-edited HTML, RDF seemed to require the help of machines just to author. A sample:

code.semantic-code {white-space: pre; font-family: Monaco, Courier New, CourierNewPSMT, sans-serif; font-size: 80%;}

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Superman</foaf:name> <foaf:mbox rdf:resource="mailto:cjkent@dailyplanet.com"/> </foaf:Person> </rdf:RDF>

This is the XML representation of RDF, using a simple example. It was stand-alone, and seemed more suited to a bygone world where XML was the only format going. And it still seems like much, too much.

Microformats were a grassroot effort to accomplish much the same thing, but far more intuitively. Their Big Idea™ was to put the data inline with the text on your webpage. You needn’t make structural changes to your HTML, and the metadata you sought to provide could be added quietly. An extra span here, an extra span there, and voilà. Based in an HTML4-friendly world, it used the class attribute meant for stylesheets. To wit:

<div class="vcard"> <span class="fn">Superman</span> <br /> <a href="mailto:cjkent@dailyplanet.com" class="email"> cjkent@dailyplanet.com </a> </div>

To pedants like myself, it wasn’t ideal to overload a construct meant for style. Then again, the limitations of the browser market over the last decade made us compromise many of our principles. This one didn’t seem quite so bad.

The vocabularies (ontologies, in this case) were another matter. While people started to formalize how to describe them with OWL, that effort was akin to describing the binding of a dictionary. It didn’t address the content inside, or how to deal with the differences in knowledge domains (e.g. medicine or engineering). Many new ontologies sprung up, the most successful probably being Dublin Core. That’s a peculiar name for a way to describe basic documents. At some point, Google set up data-vocabulary.org, but never seemed to compellingly evangelize it.

The News: Schema.org

Just a few days ago, Google, Yahoo! and Microsoft (Bing) have joined forces, and announced Schema.org. It’s their collective plan to encourage use of Semantic Web-like technologies. Well, the parts that are most relevant to their businesses, namely search. Schema.org’s intention isn’t nearly as comprehensive as that of Mr. Berners-Lee, but it’s intentionally simpler.

As the web itself has shown, simplicity in implementation often succeeds where complex but complete solutions fail.

Schema.org addresses both the markup and vocabulary issues, together, for once. This alone makes the effort the most complete, implementable implementation of Semantic Web-like principles to date. This is good news. Right?

They’ve got a vocabulary that’s appropriate for most web pages. It’s pretty good, for its purpose.

They’ve also got a format: Microdata. Huh?

Microdata started as a proposal for HTML5. It seeks to embed semantic data, in a manner somewhat similar to Microformats. To my eyes, its syntax is preferable to that of Microformats, as it is less beholden to the legacy issues that older browsers and markup that constrained its style. Microdata is the new kid on the scene, an upstart with some very heavy backing.

It looks like this:

<div itemscope itemtype="http://schema.org/CreativeWork"> <img itemprop="image" src="videogame.jpg" /> <span itemprop="name">Resistance 3: Fall of Man</span> by <span itemprop="author">Sony</span>, Platform: Playstation 3 Rated:<span itemprop="contentRating">Mature</span> </div>

*Sample above borrowed from schema.org.

Nice, eh? Microdata is modern, it has a useful vocabulary now, we’re ready to launch this rocket ship and experience The Future, right?

Meet the new boss / Same as the old boss

Remember RDF, the specification that only a machine’s mother could love? To these schema.org guys it’s old news, but to my eyes it’s merely matured, and has seriously been hitting the gym.

The new(er) RDFa (the ‘a’ stands for ‘in attributes’, i.e. inline) isn’t your father’s RDF. It’s a bit more understanding, more easy-going. If you use it pedantically, it’ll accommodate with aplomb. But it can just as easily be as concise and simple as microdata.

In fact, it can even use schema.org’s vocabulary. That’s sly and slick.

<div vocab="http://schema.org/" typeof="CreativeWork"> <span rel="image"><img src="videogame.jpg" /></span> <span property="name">Resistance 3: Fall of Man</span> by <span property="author">Sony</span>, Platform: Playstation 3 Rated:<span property="contentRating">Mature</span> </div>

*The above Microdata-to-RDFa conversion was borrowed from Manu Sporny.

So we’ve got a concise, readable inline metadata format in RDFa. We’ve got a solid vocabulary that describes most of the content that people search for on Google. That’s great, but why is your author so hot for this over Microdata?

The case for RDFa Extensibility

Got data that doesn’t fit in schema.org’s vocabulary? No problem. Choose another one that works for you. There are plenty to choose from, if choice is the issue.

Maturity

Want to use tools to validate your code? RDF(a) has ‘em, Microdata doesn’t. In fact, as of this writing schema.org’s own microdata markup wasn’t valid microdata. Oof.

No toe-stepping: Namespaces

Namespaces are a way to make sure you’re talking about the right thing, when ambiguity is a concern. If you’re ever had a mistaken communication due to a homonym, you’ll appreciate these clues for contextual clarity. Your conversation partners might be able to get what you’re saying but computers are still really, really dumb. If they were clever enough to understand homonymic collisions, they wouldn’t need our help with structured data markup, either.

If you’ve ever worked with XML, you’ve run into namespaces. You’re either passionately for or against them, and there’s little I can do in this article to change a naysayer’s mind. Except perhaps this: Don’t use ‘em, if you don’t want to. The schema.org vocabulary, which looks to be a definite winner, probably covers the uses you need. Double-check the RDFa example above, and note that there are no namespaces to be seen. If you do end up needing them, you can use them, and they’re really not that bad. Make the simple things simple, and the hard things possible.

SEO support

Okay, so this is proof of parity, not superiority, but it’s the main issue of contention for those who just want solid SEO support and don’t care about all the hardcore nerdery. Google, Bing and Yahoo! all support RDFa as rich snippets, just like they support Microdata. Really.

There’s little chance this support will go away, as search engines are built for the real-world. They must support the web as it is, not as they wish it to be. (That’s not to say that they can’t influence things, as they’re trying to here.) But you can be sure that they won’t deprecate RDFa, or even Microformats, until they observe either materially missing from the market. Even schema.org says so.

Not just HTML

Both RDFa and Microformats came from W3C members, openly using community-driven processes. Excellent. Both were created to support the ideas of getting rich semantic data inline, on the web. One came with a legacy (for good and bad), and the other did not (also for good and bad). For this specific use case, they’re roughly equivalent. But beyond that…

RDF has support everywhere, in a bazillion languages and formats. If you do more than publish, this can be immensely valuable to you. Microdata is a narrow solution to a big problem in a very specific medium. Broaden your horizons.

Adoption

With the Big Three in search now backing this Microdata thing, it must be a huge deal, and everyone except you is now using it, right? Wrong. There are no meaningful numbers to back this up. On the other hand, one recent analysis shows that RDFa usage is up 510% in the last year. Not bad.

Aesthetics

Okay, so all the options are ugly. But as a personal note of preference, I really dislike this ‘itemprop’ and ‘itemscope’ business in Microdata. It reminds me of all the horrible compound product names we’ve been inundated with. Ugh.

That’s not my primary issue (see above), but it does flavor the debate in my head.

What we got

Considering these benefits of RDFa, why didn’t we see its adoption in schema.org’s proposal? Even one of Yahoo!’s guys admits to a preference for RDFa.

As Peter Mika (the aforementioned Yahoo! guy) says:

In my personal view, one of the key problems of the Semantic Web design of the W3C has been that it considered only technical issues, and not the need for a social process that would lead to bootstrap the system with data and schemas. … Finding stable and mature schemas with sufficient adoption has eventually become a major pain point. In the search domain, the situation improved somewhat when search providers preselected some schemas for publishers to use, and started providing specific documentation, with examples and a way to validate webpages. However, as illustrated above, the efforts have been still too fragmented until yesterday.

It’s the issue of format (Microdata, RDFa, etc.) plus a solid vocabulary. Schema.org is the first to try to merge these in a big way.

Now, I doubt it’s a case of any kind of dishonesty, though it may not seem entirely obvious:

schema.org covers the core interests of search providers

That is to say, not necessarily you. But perhaps your interests align completely, and if so, good for you both.

Now what?

I want to give serious credit to the schema.org guys. They tackled a tough problem, found consensus between three behemoth competitors, and brought a solution worthy of consideration to market. Good job. That’s challenging stuff.

That said, it needn’t be the end of the story. The web doesn’t take kindly to decisions by decree, and it shouldn’t do so in this matter, either. Consider the options, and implement the best solution for you and your organization.

My advice: use the schema.org vocabulary with RDFa.

It’s easy. This news has lit a fire under the ass of RDFa advocates, and they’re already doing lots of heavy lifting for you.

Many thanks to all the smart folks who are passionately addressing this issue. Special thanks to Manu Sporny, whose own article about this issue inspired this one. Yes, he’s the chair of the W3C group that’s behind RDFa. (No, I don’t know him.) That makes him not only slightly biased, but well-informed. I have no idea if he has a financial stake in this, either personally or through his employer, but I consider the merits of the technical and implementation concerns above all else.

Dating Articles With RDFA

One of my greatest disappointments with the web is that technical articles feel like they are never dated! Yes, bloggers I would like to know how old your article is…RDFa please fix this! thanks :)

RDFa: Linked Data in HTML

rdfa.info

RDFa is an extension to HTML5 that helps you markup things like People, Places, Events, Recipes and Reviews. Search Engines and Web Services use this markup to generate better search listings and give you better visibility on the Web, so that people can find your website more easily.

Why not #Microformats or #RDFa for schema.org ?

google.com

” … Instead of having webmasters decide between competing formats, we’ve decided to focus on just one format for schema.org”

Google's Rich Snippets Testing Tool

Finally, a big win for the Semantic Web. Google recognises the value of Microformats and RDFa for search results, especially for business contacts, personal information, product details and reviews.

Rich Snippet allows you to enhance your Google search results by marking up web pages with Microformats or RDFa.

Well, it had been known for a while, it’s just that now they provide their own testing tool, which is pretty cool, right?

Mythical Differences: RDFa Lite vs. Microdata

manu.sporny.org

An argument by the chair of the W3C RDFa committee that RDFa Lite is a more effective markup standard for implementing Linked Data than microdata.

This microdata v.s. RDFa debate is really exciting!

http://inkdroid.org/journal/2012/07/06/straw/

inkdroid.org

By now I imagine you’ve heard the announcement that OCLC has started to make WorldCat bibliographic data available as openly licensed Linked Data. The availability of microdata and RDFa metadata in WorldCat pages coupled with the ODC-BY license and the availability of sitemaps for crawlers is a huge win for the library community. Similar announcements about Dewey Decimal Classification and the Virtual International Authority File are further evidence that there is a big paradigm shift going on at OCLC.

Loading more posts...