Sunday, February 12, 2006

Hopes and Fears for the Semantic Web and the Enterprise

The semantic web, being many things to many people, promises to enable cross domain knowledge exchange in such a way that is compatible with current World Wide Web architecture and assumptions. The enterprise computing environment, defined here as inter-departmental or even inter-organizational information and service integration, can certainly benefit from some of the semantic web’s visions and goals. For example, most of the work done at the enterprise level is integration between disparate data sources and services. This integration is performed at two levels; the first is characterized by building data warehouses, extract/transform/load (ETL) procedures, and generally combining data repositories to create more complete data sets. The second level of enterprise integration is concerned with defining and mapping the semantics of the information sources in order to integrate knowledge across the enterprise more efficiently. It is this second level of integration that the semantic web technologies promise to help bring to fruition, because up to now this effort has not been formalized or standardized in the enterprise. However, the semantic web has seen very little adoption in the enterprise, thus causing a general lag in adoption of the semantic web in general. If the enterprise can benefit from implementing semantic web technologies, why is adoption so slow? What has caused the stall, and what can be realistically hoped for in the future?

From the perspective of an enterprise systems developer, there are at least two causes for a general lack of adoption of semantic web technologies. The most important reason is that there isn’t a good large scale deductive database capable of integrating with standard RDBMS packages like Oracle and able to reason across millions and millions of triples . The second reason is one of marketing; the message needs to be one of “show, don’t tell”.

Enterprises have been traditionally built around large relational databases, and there are enormous amounts of investments in tools, training, and technologies that make replacing how the data is stored impossible. Therefore, for semantic web technologies to enter into the enterprise, they must integrate with these large databases to extend their functionality in ways that are easier and more efficient than traditional means.

For instance, to achieve the enterprise wide knowledge sharing that I argue the enterprise would find enticing, relational databases need to be able to directly import ontologies and rules just as easily as the standard INSERT, UPDATE, and DELETE statements from SQL. Ontologies can act as the glue between multiple information sources, providing a consistent view of all the information, augmenting it and filling in the gaps. To efficiently do this, however, requires direct integration with the data source because it is impossible to perform enterprise wide reasoning in memory only. Much like why a relational database doesn’t load up all tables into memory before doing a search, performing ontology driven reasoning efficiently requires intelligent integration with the data on disc as well as in memory.

Another reason why the semantic web hasn’t been accepted in the enterprise yet is one of marketing. So far, most of the message of the semantic web has been in the form of “what might be possible in the future.” What’s lacking is what we can do right now with these technologies to help us integrate and assimilate information. There needs to be more emphasis on showing the enterprise what it can do, instead of telling it what it could do.

To be fair, that message of “show, don’t tell” is difficult now because much of the exploratory work is still being performed. The issues of ontology authoring and reasoning across very large databases are still being worked out.

There are some other practical issues that are keeping the semantic web from being widely adopted. One issue is the awful RDF XML syntax, which is cryptic and does not integrate with existing XML tools. Creating a new XML serialization of an RDF graph, one that can slip seamlessly into existing XML processing pipelines, would do wonders to help RDF integrate with the rest of the document processing systems in use today.

Another practical issue is that RDF does not have a formal or standardized integration point with XHTML. While not directly related to the goals of knowledge integration at the enterprise level, it is certainly a restraining factor against RDF mindshare. It should be very easy to insert a RDF triple into an XHTML document, and until then, RDF adoption will be stalled. If more documents on the Web had RDF triples embedded in them, then awareness for RDF would increase, thus helping overall adoption.

In summary, there are both large scale issues and practical issues involved in holding back semantic web technologies from general adoption in the enterprise. The main issue is the lack of efficient integration with existing relational databases. The second issue is one of marketing, where the message is muddled with too much speculation and not enough concrete problem solving. Practical issues continue to plague general RDF adoption, such as the antiquated and difficult to process RDF/XML syntax, and the lack of a standard RDF in XHTML procedure.

With those large issues looming over head, what can those that still have an optimistic view of the semantic web landscape hope to expect? The good news is enterprise application providers such as Oracle recognize the potential for technologies like RDF and RDF Schema. Oracle 10g includes a native RDF store, built upon their Spatial products. Oracle 10g even supports rules, including built-in RDF Schema rules, although only in a read-only, data warehouse type of usage. While not sufficient for OLTP type applications, Oracle’s RDF support is an important first step at putting semantic web technologies into the enterprise. We can continue to hope that this support increases, included support for OWL, and that other database manufacturers include RDF and rule support in their RDBMS’s.

We can also hope that RDF gets a new XML syntax, in order to allow it to be processed by the enormous amount of XML toolkits that already have been accepted by and integrated into the enterprise. Work has been done here as well, most notably by Dave Beckett’s RXR format . A new XML serialization needs to be described by XML Schema, human readable, and most importantly utilizing modern XML techniques in order to ease integration with other XML formats and tools.

To summarize, the semantic web technologies promise easier and formal knowledge integration to ease enterprise information integration. Significant barriers exist, although there are glimmers of hope on the horizon. Through combined efforts of enterprise software vendors delivering real products, and new marketing campaigns that stress what is possible now, the semantic web can deliver on some of its promises. One thing is for certain, and that’s no towel has been thrown in quite yet.
Post a Comment

Disclaimer

I'm probably required to say that the views expressed in this blog are my own, and do not necessarily reflect those of my employer. Also, except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the BSD License.