GUO Graph Diff
Introduction
GUO Graph Diff is a prototype script for performing "diffs" on RDF Graphs, the output of the diff is in RDF using GUO the Graph Update Ontology. The Graph Diffs produced are intended to be used as PATCHes against RDF graphs
Demo One - Compare any two RDF graphs
Simply enter the uri(s) for any two rdf graphs and this script will compute the difference and make a diff file to patch the old graph in to the new one.
Demo Two - DBPedia + Memento
Currently this demo is limited to performing diffs on dbpedia data, the script will automatically generate a Graph Diff between the Memento for the RDF Graph and the data currently returned by dbpedia.
Example: http://webr3.org/diff/http://dbpedia.org/data/Linked_Data
The output of the demo is currently in N3/Turtle, and the content type is set to text/plain so that it can be viewed without your browser prompting to download.
If you are reading this, I'm assuming you are familiar with dbpedia, thus to do diffs on different graphs simply swap Linked_Data in the example to the identifying part of any dbpedia/resource as such:
- http://webr3.org/diff/http://dbpedia.org/data/Tim_Berners-Lee
- http://webr3.org/diff/http://dbpedia.org/data/Resource_Description_Framework
Notes:
This is the first version, written in PHP and utilizing ARC2 as both a serializer and a parser, all diff functionality is fully custom and should work with trees of both anonymous and owned blank nodes, the script also matches up equivalent blank nodes which are linked to by resources and such like, and the whole thing compares the raw "deserialized" triples / graph, walking the graph, rather than just doing a diff on the files. There is an intresting DesignIssue which I found half way through making this, well worth reading if you are interested.
This will possibly open sourced soon, once I have prototyped the PATCH script. Remember to account for the time it takes to download the two rdf graphs from dbpedia and the memento timegate, as this accounts for the bulk of the script execution time; the exact graph diff processing time is output in the rdf.
Please do email me your comments and suggestions, or hit me up on twitter webr3, or even see my blog.