Tumblelogs have exploded in popularity over the last year and a half.
Tumblelog services, such as Tumblr, allow users to post media (anything from text, audio, photo, or video) on their tumblelog, which is then pushed into other followers’ feeds, where they are able to view, comment and like posts. Different tumblelogs function differently, but many include the ability to republish others’ content on their own blog very easily. Tumblr’s ‘reblog’ feature allows users to instantly republish posts to their own site, with the option to modify and add comment to posts. Reblogged posts will always leave a trail back to their original poster, but not always back to the true source of the content. While there is an option to add a source to posts, they can be removed by subsequent reblogs, and aren’t always used to accurately source material.
There are still ways to source and group material, by adding additional metadata such as tags, but these require action from the user, and are often not done semantically or accurately. This lack of proper cataloguing, has been the subject of criticism by artists and controlling bodies alike, and rightly so. Properly sourced material points to original owners and creators of works, who are due acknowledgement for their creative ability, whether it be carefully crafted poetry or “exploitables” such as LOLcats.
Users cannot be trusted to maintain order in a society that has such little concern for semantics. Therefore the task of creating a semantic web is left to the creators of the technologies that allow for user-powered content creation.
First, in order to accomplish this task, the concept of “content re-publishing” needs to be updated. It is a term first used to describe features such as Twitter’s “retweet” function and Tumblr’s “reblog” function. This works, but limits the definition to individual networks, and doesn’t even take into account the possibility that the work was taken from a third party. While truly original posts that stay inside of the network are good, what about content that was screencapped from a television show, remixed, posted on Reddit, upvoted to page one, and then reblogged on Tumblr 3,000 times? Tumblr users will only be witness to the last part of the content’s journey, while Reddit users could be completely unaware of the conversation that was happening on Tumblr. Finally, the very first origin of the content (the television show) might have never been properly sourced, leaving ambiguity as to the origin.
The user shouldn’t have to worry about sourcing content properly because, given the option, they usually do not. However, while it is usually looked down upon, this product of user laziness can easily be remedied by technologies available today.
Imagine this: a new social media space that aggregates and maps content as it is spread across the internet. Pages can be dynamically created and communities can be built up in a way that is both semantic and human. On this new site (which for this paper’s purpose will be called Schema), content will be arranged into pages called schemas which will organize content from all over the web. They will be updated constantly with popular posts from social media sites that have open APIs, and will display information on the subject drawn from open information sources such as Wikipedia, Wikisource and IMDB. A schema page can be anything from a film, to the star actor, to the town it was filmed in. Users will be able to understand and navigate these schemas because they mirror the psychological concept of the same name.
Content will be ordered into schemas by a tagging algorithm, which will crawl over data published to open social media networks, looking at similar words and key phrases to discover similarities. Once enough uncategorised similarities have been recognised by the algorithm, it will create a new schema for the information to be tagged under, which allows the site to dynamically update itself.
The algorithm is not limited to text however—just as the web can handle a vast array of media, so should this algorithm. This multimedia cataloging process will have to be done with the help of publicly available APIs. For images, Google’s reverse image search API will be used to recognise and map the spread of images as they are published and re-published over the web. Videos will be a bit trickier, but should be able to be done in similar manner, breaking down clips into frames and searching for similarities. For audio, software similar to Shazam’s popular reverse song-searching will need to be developed, but should also be able to find original sources for remixes and samples. The most difficult media to map will be those that exist solely in physical editions. While many old books, paintings and sheet music exist for free in the public domain, and are available without restriction for browsing on wikis, many physical media remains inaccessible, often due to legal restriction. For content that cannot be accessed, whether by its exclusively physical existence, or legal reasons, content can still be mapped with help from so called “storyverse” mapping services such as Small Demons.
This algorithm will also be able to recognise when content has been remixed. Songs that sample others will semantically link to their origin (which will in turn link back to all material that has sampled or remixed the content).
It will be a world in which canonical and non-canonical, original and sample-based, official and fan-fiction will all be equal and linked together. For content that legally allows it, schemas will be able to interactively link to each other. For instance, a schema for Girl Talk’s most recent record, All Day, will allow users to play tracks, and watch as samples are identified in real time and linked back to their sources. Content that is aggregated from social media by the crawler can feature a link back to its source. If a Twitter user posts a short quote from one of Arthur Conan Doyle’s Sherlock Holmes stories, a link will be provided, linking to the exact origin, as provided by open-source projects such as WikiSource.
The search power that internet services offer can be truly incredible, but have fallen victim to private interests, and been neutered and limited. The goal of Schema is to break through information walls, and finally start accurately mapping a culture that is highly remix-centred.
One of the best features of this site will be the integrated social tumblelog, which will transform it from just a library of trivial information to a community that is appreciative of content origin. It will have a fairly simple and easy to use microblogging interface, that allows users to post on their blog. The system will basically replicate features found on the Tumblr and Wordpress content management system, but will forgo the use of complex user-created metadata. Only internal tagging will be allowed, which will allow users to organise posts internally on their blogs. Instead, all metadata will be created by the search algorithm, which will crawl the post and look for schemas that the post draws on, remixes, quotes, or is influenced by. Links will be provided in-line to individual content sources, and will allow users to explore influences and origins.
In order to promote users to post beyond simple self-motivation, a points system will be devised, similar to Reddit’s “karma” in which individual posts can be merited both for their for their contribution to different schemas, and also for other user’s appreciation of work. Creating a brand new schema that then further spawns more schemas, and having more influence over the web will add points to posts over time, rewarding users for the creation of content that reaches a wide audience.
This project would require a massive undertaking. It would need a large amount of funding, and a fantastic team of engineers to code the algorithms and connect to the massive amount of APIs that this network would rely on. However, if created it would be the first of its kind, and would be able to revolutionise the way in which remix culture works. It will be able to show the way that information and art spread over the web. The ability to link this content has been around for a long time, but the technology has never worked together well enough to allow for such an innovative space to exist. When society stops criminalising remix culture and embraces it as the art form it is, a network like this theoretically could be created, but from the way it looks now, it might be quite a long time before the change necessary ever happens.