The range in quality of web service API's is huge. Some web services have great and well documented API's while other websites and services have either no API or worse have a terrible API. What if we could level the playing field and create a basic way to allow databases to talk to one another to produce better quality more interconnected information. I call the idea IQL or internet query language (pronounced equal.)
I have a contacts database. In this database I have columns such as "FirstName,LastName,City,State,Country." I decide that I want to enumerate countries so I use a country ID and link to another table called "Countries." The table is simple and has things like ID, Country Name, Country Code, and perhaps a few other fields. I release my application and 1 year later that table is out of date because I don't ever maintain the Country table. And this isn't just Countries, it's States, zip codes, geo-political boundaries, the names of cities and towns, etc, etc, etc. My database has become a data prison for stale data.
What if we could give our database some additional smarts and allow it to connect with other databases on the internet. The idea is to create an internet based query language similar in syntax to SQL to allow you to query and join tables across the internet with data that is local in your own database. The best way to explain the idea is to show 5 hypothetical queries that aren't possible today but would be with an Internet Query Language.
- Select ID, Counties from "www.un.org/countries.xml"
I've constructed a simply query that talks to a static files and allows me to use the data as if it's local. The server I'm talking to can be a simple HTTP server or something smarter and more complex. The XML file is a basic example but it could be much smarter...
- Select Candidates.Name,Candidate.Party from "en.wikipedia.org/wiki/2008_Democratic_presidential_candidates/data"
The data you querry can be part of a public system like the Wikipedia allowing people to extract knowledge from the masses.
- Select States.Name, States.Abbreviation from "www.senate.gov" CACHE 300 days
Since talking to external databases can be slow we want to make sure that data is cached when possible. Here I'm suggesting that we retrieve a list of states but that we keep a cache of that list in our own local database for 300 days. We're unlikely to add a state very often but if we do get our 51st state we won't have to update every contact database in the US.
- Select amazon.com.Book.ISBN, amazon.Book.Link, Nytimes.com.Bestsellers.ISBN from "www.amazon.com", "www.NyTimes.com" INNER JOIN ON amazon.com.Book.ISBN = Nytimes.com.Bestsellers.ISBN order by amazon.com.Book.SalesRank
In one line I'm showing a new type of mashup. I've joined the Amazon database of books with the New York Times database of bestsellers.
- Select Videos from "YouTube.com" where Video.rating > 5 and Video.postedDate = '10/20/2007'
Select Photos from "Flickr.com" where Photo.stars > 5 and Photo.Uploaded = '10/20/2007'
This basically is shows that you can have a data API to query any website. Now that everyone is doing API's and web services it becomes clear that some API's can be great and easy to use while others are a total pain. IQL can level the playing field and allows you to query services in a similar way, all you need to know is the schema.
- What happens if the schema changes? In the same way that SQL can have views the recommendation when setting up a data server is to also use data views so that the schema can change but the view can stay the same. So if the schema changes it can impact your application. If you use a Cache you would fall back on the cache data allowing you to update for the schema change.
- What about security? It would be up to the data provider to create a policy for what data to expose publicly and what data to keep private. It makes sense for many databases to stay private but it may make sense to make a lot of databases public. The basic design as currently shown is read only. Updates, deletes and other operations would not be supported initially. Additional levels of security (username/password/API token) could be added later.
Performance issues... It's possible to create queries that would return so much data it could harm a website. For example returning a list of every book on amazon or every item on ebay may not be a good idea. Part of the administration of a IQL server would be to limit queries by time, memory and CPU resources. Some sites could require result limits to prevent performance issues.