2009-07-09

Hierarchical history for NDFs

The recording of processing history has been part of the NDF library for many years. When an application uses one or more input NDFs to create an output NDF, the NDF library creates a record of the application and its parameter values, and stores this record in the output NDF. It also copies all the history information from the "primary" (usually the first) input NDF into the output NDF.

Whilst it was recognised at the time that it would be nice to copy history from all input NDFs, the exponential growth of history information this could cause was seen to be prohibitive. But 16 years is a long time and we typically now have far greater computing resources. So we've taken the plunge and changed things so that history from all input NDFs is copied into the output NDF. However, to preserve backward compatibility, the new facilities are provided by the provenance routines in the NDG library - the NDF library itself remains unchanged.

This means that applications such as KAPPA:HISLIST, etc, that use the NDF library directly to manipulate history information are unchanged. Instead, the extended history information is stored in the PROVENANCE extension of each NDF, and can be examined using the KAPPA PROVSHOW command. Since there can be quite a lot of history information, it is not shown by default - set the new HISTORY parameter to "YES" when running PROVSHOW to change this default behaviour. Needless to say, NDFs created before these changes were made will not contain any extended history.

A common use for this extended history will be finding the value used for a particular parameter when a selected ancestor was created. We're toying with the idea of a GUI that would make this sort of thing easier by allowing an NDF's "family tree" to be navigated and searched, but for the moment the best thing is probably to use grep on the output of PROVSHOW.

2 comments:

Antonio said...

Of course, this is screaming for some sort of visualization tool. Something akin to a genealogy tree seems most sensible but there may other alternatives.

Would this be an appropriate project for a co-op student?

Tim J said...

Of course you have missed the scicom discussion over the past week where we have been making many jokes about family tree software and the gender of NDFs. The GEDCOM format is simple enough that it may well be possible to write a prov2gedcom command in perl pretty easily. The other alternative we have been looking into is graphviz.