RDF Queries and Ontologies
Danny nicely puts the problem I'm trying to solve in his post titled SPARQL trick #23. He says:
> Running inference on the store’s data as a whole may be expensive, but without it many results for a given query may be missed.
This is exactly why we are attracted to semantic web technologies. I have a lot of data, but I know there are many more pieces of information in there if I can apply some ontologies and rules. My queries against the system must search both the raw triples I have plus any triples that can be inferred by my ontologies. To me, this is one of the main value adds of the system.
The other main value add of a RDF store vs. a traditional relational store is that it's much easier and cheaper to say arbitrary things. In a relational store, your schema must be defined up front, severly limiting your ability to define data in the future. With RDF, saying anything about anything is cheap.
There are some solutions that work well for data sets that are static. You simply write your ontologies and rules, then run your triplestore through the reasoning engine. Bingo, you've got 4-5x more triples. Search away!
Problem with that is, users expect live, real time interaction with the system. They want to add a new triple, and then have immediate access to any newly inferred triples. Therefore, the batch run of your rules won't work in this scenario.
It might work to apply the rules and ontologies at query time. Treat them like regular business logic that you would write in your object model. This logic is run when requested, not run in batch with results written to the database (in typical web applications, that is). How performant would it be to treat ontologies and rules like business logic, and thus treat the triple store as a traditional relational database (that is, dumb and full of data)?
Looks like I'll give this a shot, and see where it leads us. Much like the ActiveOntology work being done over in Ruby, wiring in Jess or Drools as a base class for a Java object model might make sense here.
Of course, the downside of this is that it will never be as performant as the reasoning engine living inside the database. Time to integrate Jess directly with PostgreSQL?
> Running inference on the store’s data as a whole may be expensive, but without it many results for a given query may be missed.
This is exactly why we are attracted to semantic web technologies. I have a lot of data, but I know there are many more pieces of information in there if I can apply some ontologies and rules. My queries against the system must search both the raw triples I have plus any triples that can be inferred by my ontologies. To me, this is one of the main value adds of the system.
The other main value add of a RDF store vs. a traditional relational store is that it's much easier and cheaper to say arbitrary things. In a relational store, your schema must be defined up front, severly limiting your ability to define data in the future. With RDF, saying anything about anything is cheap.
There are some solutions that work well for data sets that are static. You simply write your ontologies and rules, then run your triplestore through the reasoning engine. Bingo, you've got 4-5x more triples. Search away!
Problem with that is, users expect live, real time interaction with the system. They want to add a new triple, and then have immediate access to any newly inferred triples. Therefore, the batch run of your rules won't work in this scenario.
It might work to apply the rules and ontologies at query time. Treat them like regular business logic that you would write in your object model. This logic is run when requested, not run in batch with results written to the database (in typical web applications, that is). How performant would it be to treat ontologies and rules like business logic, and thus treat the triple store as a traditional relational database (that is, dumb and full of data)?
Looks like I'll give this a shot, and see where it leads us. Much like the ActiveOntology work being done over in Ruby, wiring in Jess or Drools as a base class for a Java object model might make sense here.
Of course, the downside of this is that it will never be as performant as the reasoning engine living inside the database. Time to integrate Jess directly with PostgreSQL?