Much as I dislike the way SOAP is used these days, I'm not inclined
to gloat, because the deprecation just means more work for me. I used
the SOAP search service as an example in all three of my books. Now
I've got to find another free, public, non-obscure SOAP service to use
as an example (ideas?).
The official Google narrative is that the SOAP-based web service
has been replaced by something called the "Google AJAX Search API". If
you take this narrative at face value it means that Google has taken
down their web service and put up an AJAX library in its place. What's
AJAX? Who knows, it's AJAX. Here's
a typical weblog entry on the topic.
It's probably my recent proximity to Sam that's doing it, but I'm
noticing a tendency in myself to draw fine distinctions. There's a
sense in which this narrative is right and a sense in which it's
not. I'm going to pick apart the narrative and show what exactly is
disturbing about the Google AJAX Search API.
It's not true to say that "Google doesn't have a REST API to
replace it." In fact, Google has two REST APIs, and one of them
predates even the SOAP API. You've probably used this old API: its
primary endpoint is http://www.google.com/.
Yes, the Google website is in fact a very RESTful web service. The
downside of this web service is that it's a little bit difficult to
use automatically, as opposed to through a web browser. It serves data
in a human-oriented format (HTML), and you have to screen-scrape it
into a data structure if you want to do anything for it.
There are libraries for
doing this, but the other problem is Google doesn't want you to do
it. It violates Google's Terms of
Service ("No Automated Querying"). Lots of inconsiderate people
write scripts that hammer Google's REST API day and night. Google
tries to prevent this by sniffing out anything that might not be a web
browser and preventing it from accessing the API. (To see this, set
your browser's But it can't be denied that people outside of Google have a
powerful hankering for Google's dataset, so eventually someone (Nelson
Minar, it seems) came up with a second web service API that was
designed just for use by automated clients. The catch was that you had
to sign up for a unique key to use it, and that key would only work
for you 500 (later 1000) times a day.
In point of fact Nelson chose a SOAP/WSDL architecture for this web
service. But there was no need to use any different architecture at
all. Here's a possible different way of implementing the constraints
above:
This technique has a number of subtle benefits which I could bore
you with for quite a while. But its obvious benefit is that it's got
the exact same "API" as the Google website, which everyone knows how
to use.
Anyway, instead of going down a route like this (which would, I
think, have changed the history of web services quite a bit), Google
went down the SOAP/WSDL route. Now they're deprecating the SOAP
service in favor of some mysterious "AJAX API". This brings me to the
second of Google's REST APIs.
There is no magical thing called an "Ajax request". An Ajax client
makes normal HTTP requests, and processes the results automatically,
just like a web service client. An Ajax client is a web service
client.
What HTTP requests is the Google Ajax client making? I poked around a
little bit and it looks like it mainly makes GET requests to URIs that
look something like
http://www.google.com/uds/GwebSearch?callback=GwebSearch.Raw&context=0&lstkp=0&v=1.0&key=xxxxxxxxxx&term=web+services. That's
not exactly http://www.google.com/search?q=web+services,
but it's not too far off either.
The Google AJAX API consists of a browser-side Javascript library
and a server-side web service. The one acts as a client for the
other. From what little I've seen of the web service I'd consider it
quite RESTful. In fact, it's architecturally very similar to Yahoo!'s RESTful search
API. They both use the same (IMO, fairly unsafe) trick to get a
web browser to execute dynamically-generated Javascript code from
another domain.
The main difference is that Yahoo's search API can also be made to
send data (in JSON or ad-hoc XML format) instead of executable
Javascript. That makes it possible for the service to be consumed by
automated clients, not just by web browsers running client-side Ajax
programs.
Let me just see if I can do something similar with the Google web
service. The Javascript it serves is extremely close to also being a
JSON document; I should be able to hack it a little and parse it as
JSON.
Here's some Ruby code that gives you kind of a command-line Google
search like people used to write for the old SOAP API. It requires the
You can skip the code.
Now, in old episodes of MacGyver, whenever MacGyver built a
bomb out of baking soda and masking tape, the writers would change
some crucial detail (like change the masking tape to Scotch tape) so
that if kids copied MacGyver they wouldn't blow up the house. I've
done something similar here. I've removed a crucial line of code from
that program, so that people don't just go copying it and running it all
over the place.
Why did I do that? Because when it works, that program violates the
Google
AJAX Search API Terms of Service. "The API is limited to allowing
You to host and display Google Search Results on your site." I can use
the old SOAP API to write a command-line search tool, but I can't use
the new, RESTful API in that kind of application. My users can only
access the RESTful API through a specific library (Google's Javascript
library), running in a specific way (in their web browsers), for a specific purpose (displaying search results).
Wait a minute... running only in a web browser? Terms of Service?
Bootleg scripts that hack the output into something a parser can
understand? This REST web service is made available on exactly the
same programming-unfriendly terms as the Google website "REST web service"!
Instead of screen-scraping a web page, I'm now screen-scraping a
web service. I'm reverse-engineering undocumented URI formats, just like I do when I screen-scrape. So far, there's nothing on Google's end that sniffs my
user-agent to make sure the web service only runs in a browser, but
you can bet there will be as soon as that becomes a problem for
Google.
The "blow to web services" narrative is incorrect. Google did in
fact deprecate their SOAP API and expose a RESTful API. A win for REST!
Though incorrect, the "blow to web services" narrative is also
correct. Google deprecated their SOAP API, exposed a RESTful API, and
then erected a bunch of technological and legalese barriers around any
attempt to actually use the RESTful API. You're only allowed to use it
through one library in one language in one environment for one purpose. A loss for
everyone!
On the level of technological choices, this move is a big
improvement. They've gone from SOAP, which has a lot of overhead, to
plain old HTTP, which has strictly less. Gone from an RPC style, which
doesn't play well with the web, to a RESTful style, which does. This
makes an enormous amount of technological sense. From its first day on
the web, Google has exposed its dataset through a RESTful interface
that gets orders of magnitude more traffic than any "web service" it
might expose. In a sense, all they're doing now is unifying the
architectures.
When it comes to getting information into the hands of people who
can use it, Google has taken a big step backwards. The SOAP interface
was serious overkill, but what you did with it was your business (though you could
only do it 1000 times a day). The new RESTful interface is a technical
improvement, but it's encumbered with restrictions that make it a
museum piece. Unless you're writing an Ajax application using Google's
library, its true value can only be obtained illicitly. And that's the
other reason why I'm not inclined to gloat.
(2) Wed Dec 20 2006 22:10 Goodbye Google SOAP Search Service:
You may have heard that Google has deprecated
their SOAP-based search service. This comes after Nelson
Minar, who worked on that API back when he was at Google, says
he'd "never choose to use SOAP and WSDL again."
Another victory for REST over WS-*? Nope -- Google doesn't have a REST
API to replace it. Instead, something much more important is
happening, and it could be that REST, WS-*, and the whole of open web
data and mash-ups all end up on the losing side.
User-Agent to "libwww-perl" and try to
use Google.)
When you make an HTTP request to google.com, we try to figure out
whether you're a web browser or an automated client. Ordinarily, if
you're an automated client, we shut you out. But here's the deal. Now
you can sign up for an "automated client key". When you make an HTTP
request to google.com, stick your key into the
Authorization
header. Not only will we not shut you out, we'll try to make things
easy for you. Instead of a human-oriented HTML document, we'll
send you the appropriate data in an easy-to-parse XML
format. But, we'll only do this for you 500 (later 1000) times
a day.
Then it's back to shutting you out.
json gem.
#!/usr/bin/ruby
require 'rubygems'
require 'uri'
require 'open-uri'
require 'json'
KEYS = %w{GsearchResultClass unescapedUrl url visibleUrl cacheUrl
title titleNoFormatting content results adResults
content1 content2 impressionUrl}
def search(key, term)
uri = "http://www.google.com/uds/GwebSearch" +
"?callback=GwebSearch.Raw&context=0&lstkp=0&v=1.0" +
"&key=#{key}&q=#{URI.escape(term)}"
javascript = open(uri).read
# Hack quotes around the hash keys to make the Javascript string
# into JSON.
KEYS.each do |key|
find = Regexp.compile("\s*#{key}\s*:")
json.gsub!(find, " \"#{key}\" : ")
end
parsed = JSON.parse(json)
return parsed["results"], parsed["adResults"]
end
# Command-line interface begins here
(puts "Usage: #{$0} [API key] [search term]"; exit) unless ARGV.size == 2
key, term = ARGV
results, ads = search(key, term)
puts "#{results.size} results for '#{term}':"
results.each do |result|
puts result['titleNoFormatting']
puts " #{result['url']}"
puts " #{result['content'][0..70]}" unless result['content'].empty?
puts
end
unless ads.empty?
puts "Look at some ads while you're at it:"
puts '------------------------------------'
ads.each do |ad|
puts ad['title']
puts ad['visibleUrl']
puts " #{ad['content1']}"
puts " #{ad['content2']}"
puts
end
end
