Every year, the University of California at Davis pays the publisher John Wiley about $14,000 for a subscription to the Journal of Comparative Neurology, which publishes breaking research in its field. That may sound like a steep price tag for what is essentially a magazine subscription, but it's a tiny dollop of the $20 million the U.C. libraries spend every year on scholarly journals.
Scientific, technology and medical publishing constitutes an $11 billion industry. And like the rest of the publishing business, scholarly publishers have undergone massive consolidation in the past two decades. Once the province of small university presses and boutique academic imprints, scholarly journals now emanate from giant publishing conglomerates such as Elsevier, Thompson and Blackwells.
"The well-established subscription model that evolved around print journals is a cash cow," says Peter Lyman, professor at the UC-Berkeley School of Information Management and Systems. "One that the publishers are terrified of damaging accidentally, through online publishing."
But unlike trade-book publishers, who count on Amazon and Barnes & Noble to move physical units of the latest Harry Potter tome, scholarly publishers rely increasingly on electronic journal subscriptions and paid search services to fuel their revenues. Their customers -- mostly academic institutions and research organizations -- insist on providing Web access to journal content. To meet that demand while protecting their valuable data stores, the large publishers have responded by rolling out private permission-based search gateways to the contents of their journals, usually under highly restrictive license terms and tightly managed IP access.
But those pricey journal databases now compete for attention -- and search queries -- from students and faculty with ready access to Google, Yahoo and the rest. And while the public search engines may not find every article in the journal literature, a growing portion of published research also finds its way out onto the Web.
For example, when gene researchers identify a new DNA sequence, they usually submit the sequence to the National Institutes of Health's GenBank -- a public deep Web resource -- before submitting it to journals for publication.
Legislation pending in Congress would ensure that all research funded by federal taxpayers be made available free of charge to the public, over the Internet. Meanwhile, new cooperative academic initiatives like the Public Library of Science and the National Science Digital Library are trying to expand access to scholarly research, opening up more indirect competition for the proprietary publishing systems.
And as more scholarship finds its way onto the Web, page-ranking algorithms are also providing an alternative quality rating system to the traditional scholarly peer review that journals have always employed.
While page ranking won't replace the scholarly review process anytime soon, the expansion of public Web search engines will put downward pressure on the premium that publishers can command. "I don't think [page ranking] is more reliable," says Lyman, "but I do think it's perceived as legitimate. The cost of creating formally quality-controlled information may drive people to consider lower-cost alternatives."
Lyman adds, "When the public begins to use and accept non-qualified information -- relying on Google or other things to perform that function, like Technorati -- there are beginning to be quality mechanisms out there that are user-centric or generated by users,"
How will scholarly publishers react to the encroaching competition from deep Web search engines? "The publishing industry is not famous for being progressive, forward thinking or fast moving," Bray says. "But if they ignore [deep Web search], they could find themselves in a situation like the record companies, where someone finds a way to subvert them."
- - - - - - - - - - - -
The deep Web contains some 500 times more data than the surface Web; but to regard the deep Web as simply a bigger and better version of the current Web is to overlook the essential feature of databases, which is structure. Most of the deep Web is structured or semi-structured data, as opposed to the sea of flotsam HTML that bobs across the surface Web.
"Once you get into the deep Web, all of these data sources often have much more metadata available," says Bray. "This could be a huge opportunity for companies looking at new ways of presenting search results."
Deriving search results from structured data sets will open up new possibilities for search engines. In all likelihood, search engines will gradually abandon the flat listings-style result pattern you see on a typical 12-page Google result. (And who ever gets to the 12th page, anyway?) Not only could deep Web search engines present more useful and manipulable views into structured data but, given some basic lingua franca of structural vocabularies, they could also aggregate those results in endlessly permutable combinations.
"It's ridiculous to think that the one-dimensional result list is going to be the universal paradigm for all imaginable searches forever," Bray says. "If you type 'bicycle' into Google, you get a list of results having to do with bicycles. But that result is, in a very important way, a lie. It ignores the fact that some of these things are about bicycle racing, some are about bicycle manufacturing. It ignores things that Google might not even know about."
As deep Web search engines unearth the structures of large data sets and make those structures visible across organizations, they will create a powerful incentive for organizations to invest in more consistent, predictable structures (a trend already manifest in the growth of Web services and in Yahoo's search quality guidelines). In exchange for the benefits of increased exposure, these organizations will yield another level of autonomy.
While government and academic institutions may generate the greatest volume of deep Web content, corporations undoubtedly generate the most monetary value in Web data: customer databases, product catalogs, technical knowledge bases and myriad other data sources with quantifiable business value.
Over the last decade, companies have invested heavily in Web infrastructure, including countless local search engines. While many companies already outsource their public Web site search functions to companies like Google, many also have developed specialized search engines for their own deep Web data, like technical support databases.
Those investments make plenty of sense when that data won't readily show up in a public Web search. But as deep Web searchers penetrate these gateways, will companies continue to see the value of investing in their own public interfaces?
In the near term, deep Web search engines will likely dampen company expenditures on local search initiatives. But in the longer term, the changes may prove more far reaching. "The quality and ubiquity of Web search engines hides the fact that most organizations have really crappy search mechanisms," Bray says. "I think that's creating a tension within organizations."
As public search engines continue to supplant the role of organizations' own information-retrieval systems -- be they search databases, call centers or sales engineers -- once internal-facing systems will assume increasingly outward-facing roles. "When the ability to develop different messages for different audiences is curtailed by universal availability," says Gartner analyst Whit Andrews, "the nature of the message, its format and associated issues become paramount.
No one expects IT departments to go out of business, but the external pressures of deep Web search will almost certainly force long-term changes in the role, structure and autonomy of local IT organizations as they gradually lose direct control over customer transactions.
- - - - - - - - - - - -
Every search query is a unit of desire. Search companies, like all businesses, exist by transforming desire into hard currency. As deep Web search engines insinuate themselves into deeper and deeper levels of organizations, they will not only offload search traffic, they will trigger a series of massive disruptions in the information economy.
If you buy the Cluetrain maxim that "hyperlinks subvert hierarchy," then surely deep Web search engines will amplify that subversion. As search engines extend their reach deeper into and across organizations, the boundaries between those organizations will feel more fluid -- both to consumers and to the organizations themselves. The first thing most of us notice may be better search results.
Somewhere inside that complex apparatus of desire and fulfillment, a transformation is taking place, one whose effects we can barely foresee.
Editor's note: This story has been corrected since its original publication.
About the writer
Alex Wright is a writer and user experience architect in San Francisco, Calif.
Related Stories
The Google backlash
The king of search rules the Web -- but now some of the natives are growing restless.
06/25/03
Meet Mr. Anti-Google
A crusading webmaster says the popular search engine's page-ranking algorithm is "undemocratic."
08/29/02
Google à go-go
While other search engines sputter and fail, Monika Henzinger, Google's director of research, has an answer to every query.
06/21/01
Story finder (3 ways to search Salon)
Salon Directory (browse by topic)
