Open Government Data, and Open Data provided by the corporate sector, stimulate an upcoming market segment: Commercial Open Data Services. The islandic StartUp datamarket.com is on of the emerging companies in this field. Thomas Thurner from Semantic Web Company had the chance to talk to Hjalmar Gislason, founder and CEO of datamarket.com.
Semantic Puzzle: What’s the business idea behind datamarket.com? Whom do you expect to pay for what?
Hjalmar Gislason: From the end-user perspective its easiest to describe datamarket.com as a search engine for statistical data, a “Google for statistics” if you will. Any data that is already available open and for free out there will still be open and free on DataMarket, just easier to find, use, compare and download from a single source. While the audience for a search engine for statistical content is obviously way smaller than for text content, a significant part of that audience is business users, looking for data for business reasons. This means that there are more direct and lucrative methods to monetize the usage than simply contextual ads – especially in reselling access to premium data. This is a market that already turns over billions of dollars annually, but is as far from any of the “2.0 world” as one could possibly imagine (think Bloomberg, Reuters, FactSet). We believe there is an opportunity to disrupt a part of their business with a freemium approach, and furthermore open up the data market by reaching a business audience outside the narrowly defined financial user base that these companies cater to. There is data out there – free and premium alike – that can help almost any business make better plans and decisions. Connecting people and businesses to the data that they need will release phenomenal value. Tapping into just a fraction of that will be a hugely successful business for those that get it right.
Semantic Puzzle: Can you tell me a bit about the technological framework behind datamarket.com? How is the content from third parties is feeded
into the system, and which APIs do you use? As you provide mainly XLS and CSV, have you thought, to provide data also als XML in future?
Hjalmar Gislason: The backend system is written in Python. We read data from the sources in various different formats, ranging from Excel files and even scraping of web pages to proprietary APIs and Web Services. The data is then stored in a normalized format in a Postgres database that we’re using in a pretty unique way to be able to efficiently store the billions of time series and fact values that the system will eventually hold (currently at around 100 million time series and 600 million fact values). The web site is also written in Python, using the Django framework, but also making use of a lot of javascript libraries (and a bunch of our own code) to allow for an exciting user experience. We’re currently using a Flash-based solution called amCharts for the charts, but have already taken some steps to replace that with our own solution that we’ve written on top of the excellent Protovis visualization library. While you are right that the export formats we provide for end users are XLS, CSV and images (for exporting the graphs), our REST-ful API actually supports XML and JSON formats as well. So we already provide data as XML.
Semantic Puzzle: As you for sure know Tim Berners-Lee’s 5-stars scheme for OGD-Providers. Where do you se your own service in this framework?
Hjalmar Gislason: Any fact value, time series and data set on DataMarket is “addressable” with a direct URL using our API. In that sense, all the data on DataMarket is four-star data according to Berners-Lee’s definition. In many cases we’re integrating to data that is only one or two star data, so just by integrating it into our system we’ve moved it a few notches up that ladder. In some cases we’ve even been helping organizations publishing data for the first time, taking the data from 0 to 4 stars in one go. We’ve been toying around with several ideas that would take – or enable users to take – the data all the way to 5-star status, but that’s still just on the drawing table.
Semantic Puzzle: You re-use a lot of Open Data comming from the Island Government. Is there also a state-owned Data Portal for Island, or is
your service a “commercial replacement” for such a public effort?
Hjalmar Gislason: There is no government-operated data portal in Iceland, and to my knowledge there are no plans for implementing one yet. Sadly there are several more pressing issues in terms of eGovernment here that take higher priority. We don’t see our efforts as a replacement for such a portal, but we have managed to fulfill a little part of that role when it comes to statistical data. We’ve also been really vocal about the benefits of open data and among other things been influential in launching an open data wiki – opingogn.net (Icelandic only) – that exmplains the concepts with examples and use cases and attempts to list in a directory listing as many sources of government data as possible. There is some movement, but as an open data enthusiast I’d really like to see things happening faster. As a matter of fact I think there are reasons for Iceland to be extra enthusiastic about open data to increase transparency and restore trust after the crash of the banks and the economic system in 2008.
Semantic Puzzle: A lot of commercial Open Data Services (Socrata, Factual, Google …) are evolving at the moment. What do you think, which development this market segment will face in the next month and years, and are you able to list your sight on the crucial factors for such business?
Hjalmar Gislason: I’ve been writing quite a lot up on the developments in this industry on our blog. One of the things I’ve written the most about is what I call the Emerging field of Data Market“. I define “data markets” as “Services that make it easy to find data from a range of secondary data sources, then consume or acquire the data in a usable – and often unified – format.” Many of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers. As there are several players in this space already, I believe we’ll see many of them try to differentiate themselves in 2011 by focusing on specific types of data. There are definitely opportunities in building specialized data markets for geospatial data, for statistics and for enormous scientific data sets – to name a few types – and each comes with their own challenges, target audiences and preferred approaches. In the spirit of doing one thing and doing it well, I think most of these projects will want to see success in one such segment of the market before generalizing – or consolidating.
The interviewee: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise.