Response to Why we need explicit temporal labelling
Why we need explicit temporal labelling is an excellent new post on the continuing saga about temporal labeling in RDF. The author provides an great example of a real world scenario for changing values of a web page's title. To reiterate, yesterday this triple was valid:
:page dc:title "I like Cheeses";
but today it's now:
:page dc:title "I like Cheese";
The author asserts that there are now two triples now, which would indicate that there are two titles.
Going back to my relational database roots, I don't see how there would be two triples (unless you explicitly store two triples in your local Model). Given just the source RDF document that the triple is found in, at one time, there is at most on triple that asserts the page's dc:title. If I'm consuming the RDF document that asserts the triple, I'm in a position to store the URI of the RDF document. When my RDF crawler hits the same RDF page, it will simply update its local store with all new values. The old triple will be deleted and replaced by whatever new triples are asserted.
Of course, that's one strategy for crawling/consuming RDF documents. But it does remove the need to attach arbitrary metadata to triples just to attach a timestamp. I believe that if we let time into the model, it won't stop there. We have reification for saying things *about* statements. And reification has a bad wrap mainly because of the syntax, not the model.
In any case, the use case of a web page's title changing over time is excellent, but correctly modeling it doesn't require a new addition to the RDF model. You can store the time you received the RDF document that asserted the triple, you can use reification to say what time the statement was asserted, or you can model explicitly that titles have a date at which they were said. Heck, nothing stops you from adding your own reifications to the triples you just downloaded.
I want to talk about one statement the blog post said:
> In the current model, I would end up with two titles for this article. While technically correct, it is intuitively wrong - and that difference is what holds back RDF for most developers. They expect to see a single title with the updated value.
Developers don't always expect to see a single value for the title. What if someone says "I want to know what the title for the web page was two weeks ago?" In other words, it's all in how you look at the data and what you're trying to see. If all you care about is the *now*, then track where triples came from (the original RDF document) and consistently update it. Delete all old triples from the original document when you do an update.
Maybe this points out that an RDF triple is pretty bare all alone, and tracking it's source document is pretty important.
On the semantic web, you can't un-say something, and that's part of this whole problem. If I can't un-say something, how do I say, "This thing I just said, well, it's no longer true." Attaching a timestamp doesn't really help to un-say anything, because there's no semantics of TTL to the timestamp. Just because there's a timestamp of yesterday on a triple doesn't mean that today that triple is invalid.
The bigger question I have is, why don't I ever have this problem of temporal labeling when writing relational database applications? When I need time as explicit data, I put it into the relational model (usually as a created_on, updated_at, performed_on, etc). If time isn't important to the data, it's assumed that whatever is in the database is the truth at now.
The web has a nice way to declare if resource representations can be cached, therefore if you can trust the data inside the representation for longer than when you received the document. If I receive an RDF document whose HTTP headers say not to cache it, then I better treat the triples inside the document as only truthful for *now*. For if I try to query the triples again from a local cache, I better understand that the values might have been updated from the source Resource. So what is the relationship between a triple, the document it's in, and the HTTP headers sent with the document?
Wow, got off track there.
:page dc:title "I like Cheeses";
but today it's now:
:page dc:title "I like Cheese";
The author asserts that there are now two triples now, which would indicate that there are two titles.
Going back to my relational database roots, I don't see how there would be two triples (unless you explicitly store two triples in your local Model). Given just the source RDF document that the triple is found in, at one time, there is at most on triple that asserts the page's dc:title. If I'm consuming the RDF document that asserts the triple, I'm in a position to store the URI of the RDF document. When my RDF crawler hits the same RDF page, it will simply update its local store with all new values. The old triple will be deleted and replaced by whatever new triples are asserted.
Of course, that's one strategy for crawling/consuming RDF documents. But it does remove the need to attach arbitrary metadata to triples just to attach a timestamp. I believe that if we let time into the model, it won't stop there. We have reification for saying things *about* statements. And reification has a bad wrap mainly because of the syntax, not the model.
In any case, the use case of a web page's title changing over time is excellent, but correctly modeling it doesn't require a new addition to the RDF model. You can store the time you received the RDF document that asserted the triple, you can use reification to say what time the statement was asserted, or you can model explicitly that titles have a date at which they were said. Heck, nothing stops you from adding your own reifications to the triples you just downloaded.
I want to talk about one statement the blog post said:
> In the current model, I would end up with two titles for this article. While technically correct, it is intuitively wrong - and that difference is what holds back RDF for most developers. They expect to see a single title with the updated value.
Developers don't always expect to see a single value for the title. What if someone says "I want to know what the title for the web page was two weeks ago?" In other words, it's all in how you look at the data and what you're trying to see. If all you care about is the *now*, then track where triples came from (the original RDF document) and consistently update it. Delete all old triples from the original document when you do an update.
Maybe this points out that an RDF triple is pretty bare all alone, and tracking it's source document is pretty important.
On the semantic web, you can't un-say something, and that's part of this whole problem. If I can't un-say something, how do I say, "This thing I just said, well, it's no longer true." Attaching a timestamp doesn't really help to un-say anything, because there's no semantics of TTL to the timestamp. Just because there's a timestamp of yesterday on a triple doesn't mean that today that triple is invalid.
The bigger question I have is, why don't I ever have this problem of temporal labeling when writing relational database applications? When I need time as explicit data, I put it into the relational model (usually as a created_on, updated_at, performed_on, etc). If time isn't important to the data, it's assumed that whatever is in the database is the truth at now.
The web has a nice way to declare if resource representations can be cached, therefore if you can trust the data inside the representation for longer than when you received the document. If I receive an RDF document whose HTTP headers say not to cache it, then I better treat the triples inside the document as only truthful for *now*. For if I try to query the triples again from a local cache, I better understand that the values might have been updated from the source Resource. So what is the relationship between a triple, the document it's in, and the HTTP headers sent with the document?
Wow, got off track there.