Hello!

Tumblr is where tens of millions of creative people around the world share and follow the things they love.

Sign up to find more cool stuff to follow

Long Live the Semantic Data Warehouse!

semanticweb.com

This article is crammed full with little quotes that I really should remember when I’m trying to explain what’s fundamentally wrong with trying to store your data in traditional databases. Anyone who’s ever developed a system for a customer (whether in-house or externally) knows the pain of having shifting requirements not only throughout the development process, but after go live too.

At first I would roll my eyes and tsk at the non techies who hadn’t thought through how their work was structured (so that I could design a system that fitted it) - but that makes no sense - of course the requirements change as time goes on, it’s completely natural for a business model to shift and change as it needs to - you’d bloody hope it would! If it doesn’t your working for one of those companies that refuse to do anything differently because “that’s how they did it in 1997”.. or something.

But to get back to the original point, and to quote the article:

With a Semantic Data Warehouse, there is a fundamental assumption that the schema is never finished.  It evolves.

You don’t have to map out the entire organisation before you build a system for it - you just build the bits you know, and if the model changes, then you adjust your own data model accordingly and move on. You can do this because you’ve not set out your schema in concrete, with painful reconstruction needed with each change - instead you simply adjust the relationships between your objects (nodes/topics, whatever you want to call them). Simples.

It’s not about the Warehouse or Federation.  It’s about a dated, inflexible model.  That is why semantic technologies matter.  With that flexibility you can do more, faster.

http://semanticweb.com/down-with-the-data-warehouse-long-live-the-semantic-data-warehouse_b23245

WWW inventor: HTML5 will make Minority Report look like child’s play | Silicon Republic

Open data heralds the internet’s next exciting phase

Considered one of the pioneering fathers of the internet, Berners-Lee believes we are only at the dawn of an even more exciting era - the era of open data and the semantic web, where almost every feasible physical device or piece of data will be interlinked online.

“The semantic web vision has taken a long time to come to fruition because the web is so exciting in many other ways,” says Berners-Lee, who has been driving new metadata labelling formats to make everything linkable.

This brings us on to the next big revolution - open data - and he says governments and businesses are at the forefront of opening up datasets for individuals, citizens and other businesses to make more informed decisions.The future web we are about to see will be one in which data and devices everywhere will be interlinked and metadata is central to this - effectively who owns A or B in the same way individuals own the deeds to their homes but the difference is allowing this data to be usable and open.

He cites the corner boxes you see on Wikipedia, for example, as a case of how databases and datasets can be globally linked.

(read more on Silicon Republic)

Redis based triple database

Meshin application relies on back-end triple store for holding person semantic index and for processing front-end queries. This back-end store is built on top of open source in-memory key-value database Redis. Before getting into details of how triples are represented and queried I will briefly introduce essential Redis features. Fill free to skip following paragraph if you are familiar with Redis.

Redis is key-value store where keys are binary strings and values can be either simple binary strings or higher order data structures. These data structures include ordered lists, unordered unique sets, secondary level hash tables (hsets) and weight sorted sets (zsets). Redis exposes its functionality through simple text based protocol. Protocol defines number of commands and corresponding replies. Commands are either general for all kinds of key-values or specialized for value type. For example SET A X associates binary string value X with key A.

Read More

Web 3.0 is Silky?

image

Before I begin, allow me to acknowledge that I DO recognize how asinine it is to attempt to organize technology and its evolutions in a neat system of points (1.0 was the age of portals, 2.0 the age of Social, etc.). Clearly, the discussion/landscape is much more nuanced.

That being said, many predict that the future Web (“Web 3.0”) is a semantic one, driven by a tremendous amount of Data. An age where the lines between tech and human awareness are blurred. An age where our machines often know us better than our own mothers (a bit unsettling, I know). An age where our technological warlords force us to do their bidding (okay, that one’s a Sci-Fi fantasy…could happen, though).

We’ve seen manifestations of this future through behaviorally targeted ads that follow us across our browsing experience. Search engines that can predict our queries before we complete our first word, let alone sentence. Social applications that know an astonishing amount about us and our closest friends (or frienemies in some cases).

Our e-commerce experience has also been shaped by this “subtle” form of technological stalking. For many of us, sites such as Amazon are at the centerpiece of our digital shopping universe. We go there to buy music, movies, electronics, clothing, toiletries…..what doesn’t Jeff Bezos freaking sell?!

Beyond pushing merchandise of all flavors, the foundation of Amazon is, of course, their powerful suggestion engine. As we browse items, add to our carts, make purchases- Amazon is tracking us all the way. The data collected is used to develop robust, personal user profiles, which allow Amazon to suggest very relevant items, offers, etc. for US (based on MY interests, and the interests of others like ME).

From a practical standpoint, this helps Amazon drive incremental sales and revenue. In essence, they are able to bring to the surface products that a user would be interested in, but may have otherwise missed (pushing items vs. relying exclusively on pull). Moreover, the amount of data that they have generated on individual users like myelf, allows Amazon to “get us” in ways that very few platforms/companies can.

Amazon Gets Silky

Recently, Amazon announced their version of an Android tablet, the Kindle Fire. This moment was compelling news on many different fronts:

  1. Amazon was diving head-first into the Android pool; not simply dabbling in the shallow Software-end, but fearlessly treading into the deep-end of offering their own device solution.
  2. The Fire would be priced aggressively, making it the “potential iPad killer” of the month (still not buying that one).
  3. The device would be chock-full of all of Amazon’s proprietary content goodness: Kindle books, Amazon MP3s, etc.

And then there was Silk, Amazon’s own browser, created specifically for their tablet. Beyond the shock of the company moving into the browser game, Silk promised a few very revolutionary (or potentially revolutionary), Mobile browsing features.

The majority of media outlets/curious end users focused on the performance elements. Unlike other browser offerings, Silk would render pages in dual-fashion. Part of the heavy lifting would occur via the Cloud, with certain elements of web pages being delivered courtesy of Amazon’s own servers. Part would come from local rendering on the Fire itself. This tandem effort would allegedly increase speed of loading, making our Mobile browsing experience that much better (and all of us happy, Mobile campers).

I, myself, gravitated towards a small, but potentially monumental revelation: “Silk will also predict your browsing habits”. Now, this feature certainly is tied to the performance element as much as anything. By anticipating the pages a user is likely to visit next, the browser is able to pre-load; again increasing the speed/quality of experience.

However, a more potentially powerful use/reality exists. 

If I were to visit ESPN.com, Silk would theoretically be able to determine that I am a football fan first and foremost, and that my favorite team is the Denver Broncos (Tebow Time!). As a result, it could make the accurate presumption that I would be looking for news/articles specific to those interests, and preemptively serve me the appropriate pages/content.

If I were visiting a restaurant’s site, Silk may recognize based on previous browsing behaviors (foodie blog visits, other restaurant sites, etc.), that I have a weakness for a great burger. It could then highlight the menu section/food descriptions that would best satisfy this culinary preference.

Certainly this capability already exists at the individual site level. Our web experiences are often customized based on cookies or user profiles that are activated by the sign-in. However, there is no underlying thread that allows for ubiquitous/consistent personalization as we move through disparate properties. Though Facebook has very much attempted this (and succeeded to a certain degree via Open Graph), the most seamless/all-encompassing unification would likely occur at the browser level (as it is the foundation/constant of our Web experience).

Aside from the browser that Amazon now has, it has also:

  1. Accumulated a significant amount of data and behavioral insights on its own site property (as we discussed earlier). 
  2. Developed a sophisticated engine/algorithm to make use of said data/insights (as also discussed earlier).

Combine all of these ingredients, and Amazon has all of a sudden put itself in a position to not only compete in “Web 3.0”, but potentially lead the charge. Historically, the tendency has been to view the company as a digital provider of tangible goods- I, for one, believe that it is time to drastically alter that opinion, Folks.

In a World of Devices, the User is Central - semanticweb.com

semanticweb.com

Roger Kay of Forbes recently opined that the world is becoming more and more user-centric. He explains, “User-centric computing is a theme we can expect to hear articulated in many ways next week at the Consumer Electronics Show (CES) in Las Vegas. The simple view of the shift from device-centric to user-centric computing goes like this: when all we had was one device — a PC, first to do our work and later to connect to the Internet — we adapted to the device.  We learned how to wrestle it into more or less obeying our will.  We became skilled at the arcane keystrokes of DOS commands and Lotus 1-2-3 in order to do productive work.  We went to the machine.”

He goes on, “Now, the machine is starting to come to us.  As soon as you have more than one device in your life, they must necessarily point to you.  Cloud services increasingly coordinate devices for us so that our ‘state’ — the exact condition of all our stuff at any one moment — migrates seamlessly from one device to another other. A good example would be reading an eBook.  If you stop reading on a certain page on your laptop, you should be able to open your eReader and be on the same page.  Ditto for your phone.  This idea that ‘state’ follows you around puts you at the center of your own universe.  The devices are all around you.”

Read more here.

Are You Ready for the New Peer-to-Peer Economy?

gigaom.com

Interesting trend that’s going on, though given that the devices used to connect to the internet are still mass produced I don’t see this as nearly as revolutionary as made out to be. I did enjoy the mention of kickstarter. I don’t know if I would necessarily say this eliminates the middleman so much as creates a new, more hands off one. Could this be considered an extension of the semantic web outside of just linked data?

“The internet can solve many of the difficulties in “knowing your farmer.” Much of the information asymmetry in food exists because a cost effective way to make information travel with the food hasn’t been applied. It is starting to happen in silos, for flour, for chocolate, and for wool, but soon the full power of the semantic web will be applied to food. Mobile pervasiveness is eliminating the boundary between offline and online, enabling seamless access to information. Open participation by all interested parties—whether consumers, producers or distributors—can democratize the sourcing, verifying and sharing of information.”

Anthony Nicalo on hacking the food system and how the semantic web could eliminate information asymmetry in the “know your farmer” problem  (via)

Core Ontology Pattern and Visualization Index

Semantic Web portal dedicated to ontology design patterns (ODPs).

http://ontologydesignpatterns.org/wiki/Main_Page

—————-

NeOn Toolkit: ontology engineering environment 
http://neon-toolkit.org/wiki/Main_Page

The Stanford parser: a statistical parser (version 1.6)
Stanford University
http://nlp.stanford.edu/ software/lexparser.shtml. 

The FDG parser: a statistical parser (version 3.7)
http://3d2f.com/download/11-349-parser-generator-freedownload.shtml

The Charniak parser: a statistical parser (version 1.0) 
http://www-tsujii.is.s.utokyo.ac.jp/~tsuruoka/chunkparser.

Penn Tree bank Project
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Graphviz
http://www.graphviz.org/Download.php

(2010)

—————-

Visualizing Domain Ontology using Enhanced Anaphora Resolution Algorithm

http://arxiv.org/pdf/1109.2321

Cited Works

1. Ontology-based semantic matchmaking approach

Gao Shu, Omer, F. Rana, Nick, J. Avis, Chen Dingfang  (2007)

Elsevier, Advances in Engineering Software vol. 38, pp. 59-67.

2. Ontology based multiperspective requirements traceability framework

Namfon Assawamekin, Thanwadee Sunetnanta and Charnyote Pluempitiwiriyawej (2009): 

Knowledge and Information Systems journal, Springer –Verlag London.

3. An ontology-based approach for traceability recovery

Zhang,Y.,Witte, R., Rilling, J. et al  (2006):

In the Proceedings of the 3rd international workshop on metamodels, schemas, grammars, and ontologies for reverse engineering (ATEM 2006), Genoa, pp 36–43.

4. Recovering traceability links between code and documentation

Antoniol, G., Canfora, G., Casazza, G. et al,  (2002): 

IEEE Trans Softw Eng, vol. 28, no.10, pp. 970–983

5. Ontologies for knowledge management: an information systems perspective

Jurisica, I., Mylopoulos, J., Yu, E., (2004): 

In Knowl Inf Syst, vol.6, no.4,pp. 380–401.

6. The role of ontologies for an effective and unambiguous dissemination of clinical guidelines

Pisanelli, DM., Gangemi, A., Steve, G.  (2000): 

In Knowledge Engineering and Knowledge Management. Methods, Models, and Tools, Dieng R, Corby O (eds). pp. 129–139.

7. Efficiency of ontology mapping approaches

Marc Ehrig and Steffen Staab (2004): 

In International Workshop on Semantic Intelligent Middleware for the Web and the Grid at ECAI 04, Valencia, Spain.

8. Evaluating ontological decisions with OntoClean

Guarino, N., Welty, C. (2002): 

Commun ACM vol. 45, no. 2, pp. 61–65.

9. A framework for ontology integration

Calvanese, D., De Giacomo, G., Lenzerini, M. (2001):

In Proceedings of the 2001International Semantic Web Working Symposium (SWWS 2001) CA, USA..

10. Some tools and methodologies for domain ontology building

Aldo Gangemi (2003):

Wiley InterScience, Comp Funct Genom,vol. 4, pp. 104–110.

11. Understanding natural language

Winograd, Terry, (1972):

New York: Academic Press.

12. Towards An Annotated Database For Anaphora Resolution

Delmonte, R.,Chiran, L. and Bacalu. C. (2000): 

LREC, Atene, pp.63-67.

13. A ranking approach to pronoun resolution 

Denis, P. and Baldridge, J. (2007): 

In the Proc. Of IJCAI 2007.

14. Resolving anaphoric references on deficient syntactic descriptions

Stuckardt, Roland.  (1997): 

In the Proceedings of the ACL’97/EACL’97 workshop on Operational factors in practical, robust anaphora resolution, 30-37. Madrid, Spain.

15. A Hybrid System for Summarization and Question Answering

Delmonte, R.: Getaruns (2003) : 

16. Comparing Knowledge Sources for Nominal Anaphora Resolution

Katja Markert, Malvina Nissim  (2005) : 

Association for Computational Linguistics, vol. 31, no.3.

17. An algorithm for Pronominal Anaphora Resolution

Shalom Lappin, Herbert J.Leass (1994): 

Association of Computational Linguistics.

18. Evaluating auotamted and manual acquisition of anaphora resolution strategies

Aone, Chinastu and Scott Bennet (1995) : 

In proceedings of the 33rd Annual Metting of the Association of Computational Linguistics (ACL’95), pages 122-129.

19. Coreference for NLP applications 

Morton, T. S. (2000): 

In Proc. of ACL 2000.

20. TERMINAE: a linguistic-based tool for the building of a domain ontology

Aussenac Gilles, N. & Biebow B, Szulman S. (1999). 

In EKAW’99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modelling and management: LCNS, Berlin, Springer-Verlag, (pp.49-66).

21. TextOntoEx: Automatic Ontology Construction from Natural English Text

Mohamed Yehia Dahab, Hesham A. Hassan & Ahmed Rafea  (2006)

AIML 06 International Conference, Sharm El Sheikh, Egypt.

22. (ONTO) Agent: An ontology-based WWW broker to select ontologies

Arpirez JC, Gomez-Perez A, Lozano A & Pinto HS (1998)

ECAI’98 Workshop on Applications of Ontologies and Problem-Solving Methods : Brighton, (UK), (pp 16-24).

23. Stanford typed dependencies manual

Marie-Catherine de Marne e and Christopher D. Manning, (2010)

Stanford Parser Library.

Semantic Web Gets a Boost

technologyreview.com

Via Technology Review:

Google, Microsoft, and Yahoo have teamed up to encourage Web page operators to make the meaning of their pages understandable to search engines.

The move may finally encourage widespread use of technology that makes online information as comprehensible to computers as it is to humans. If the effort works, the result will be not only better search results, but also a wave of other intelligent apps and services able to understand online information almost as well as we do.

The three big Web companies launched the initiative, known as Schema.org, last week. It defines an interconnected vocabulary of terms that can be added to the HTML markup of a Web page to communicate the meaning of concepts on the page. A location referred to in text could be defined as a courthouse, which Schema.org understands as being a specific type of government building. People and events can also be defined, as can attributes like distance, mass, or duration. This data will allow search engines to better understand how useful a page may be for a given search query—for example, by making it clear that a page is about the headquarters of the U.S. Department of Defense, not five-sided regular shapes.

The article goes on to note that Schema.org standards support microformats microdata* rather than RDFa which is supported and promoted by the international Web standards body W3C.

Still, if it can gain traction, it’s a big step forward for machine understanding of all this content we’re throwing at the Web which, in turn, means a whole new class of applications using such data might be in our near future.

*Hat tip to Aaron Bradley (@aaranged) on Twitter for pointing out that it’s microdata, not microformats, that Google, Microsoft and Yahoo are supporting.

Semantic Web and Enterprise Architecture

MIT Technology Review, 29 October 2007 in an article entitled, “The Semantic Web Goes Mainstream,” reports that a new free web-based tool called Twine (by Radar Networks) will change the way people organize information.

Semantic Web—“a concept, long discussed in research circles, that can be described as a sort of smart network of information in which data is tagged, sorted, and searchable.”

Clay Shirky, professor in the Interactive Telecommunications Program at New York University says. “At its most basic, the Semantic Web is a campaign to tag information with extra metadata that makes it easier to search. At the upper limit, he says, it is about waiting for machines to become devastatingly intelligent.”

Twine—“Twine is a website where people can dump information that’s important to them, from strings of e-mails to YouTube videos. Or, if a user prefers, Twine can automatically collect all the web pages she visited, e-mails she sent and received, and so on. Once Twine has some information, it starts to analyze it and automatically sort it into categories that include the people involved, concepts discussed, and places, organizations, and companies. This way, when a user is searching for something, she can have quick access to related information about it. Twine also uses elements of social networking so that a user has access to information collected by others in her network. All this creates a sort of ‘collective intelligence,’ says Nova Spivack, CEO and founder of Radar Networks.”

“Twine is also using extremely advanced machine learning and natural-language processing algorithms that give it capabilities beyond anything that relies on manual tagging. The tool uses a combination of natural-language algorithms to automatically extract key concepts from collections of text, essentially automatically tagging them.”

A recent article in the Economist described the Semantic Web as follows:

“The semantic web is so called because it aspires to make the web readable by machines as well as humans, by adding special tags, technically known as metadata, to its pages. Whereas the web today provides links between documents which humans read and extract meaning from, the semantic web aims to provide computers with the means to extract useful information from data accessible on the internet, be it on web pages, in calendars or inside spreadsheets.”

So whereas a tool like Google sifts through web pages based on search criteria and serves it up to humans to recognize what they are looking for, the Semantic Web actually connects related information and adds metadata that a computer can understand. It’s like relational databases on steroids! And, with the intelligence built in to make meaning from the related information.

Like a human brain, the Semantic Web connects people, places, and events seamlessly into a unified and actionable ganglion of intelligence.

For User-centric EA, the Semantic Web could be a critical evolution in how enterprise architects analyze architecture information and come up with findings and recommendations for senior management. Using the Semantic Web, business and technology information (such as performance results, business function and activities, information requirements, applications systems, technologies, security, and human capital) would all be related, made machine readable, and automatically provide intelligence to decision-makers in terms of gaps, redundancies, inefficiencies, and opportunities—pinpointed without human intervention. Now that’s business intelligence for the CIO and other leaders, when and where they need it.

Infrequently Noted: Things the W3C Should Stop Doing

infrequently.org

via Infrequently Noted

At this year’s TPAC, it should be agreed that the W3C will divest itself of any and all Semantic Web, RDF, XML, Web Services, and Java related activities. SVG can be saved, but only if it re-charters to drop all XML dependencies in the next version.

How Watson Works - semanticweb.com

semanticweb.com

Ivan Herman recently offered some insight into how Watson actually works. Herman reports, “I was at Chris Welty’s keynote yesterday at the WWW2012 Conference. His talk was on Jeopardy/Watson and, althou

gh this is not the first time I heard/saw something on Watson, some things really became clear only at his keynote. Namely: what is really the central paradigm that made the question answering mechanism so successful in the case of Watson? Well… query answering in Watson is not some sort of a deterministic algorithm that turns a natural language question into a query into a huge set of data. This approach does not work.”

He continues, “Instead, a question is analyzed and, based on search in various set of data, a large set of possible answers is extracted. These ‘candidate’ answers are analyzed separately along a whole series of different dimensions (geographical or temporal dimensions, or, which I found the most interesting, putting back candidate answers into the original question and search that again against various sources of information to rank them again). The result is a vector of numerical values representing the results of the analysis along those different dimensions. That ‘vector’ is summed up into one final value using a weight values for each dimension. The weights themselves are obtained through a prior training process (in this case using a number of stored Jeopardy question/answers). Finally, the answer with the highest value (I presume over a certain threshold value) is returned.”

Read more here.

Loading more posts...