The W3C Working Group has published its first draft of ‘Data on the Web Best Practices’. The Working Group has come up with a list of no less than 27 best practices. What is encouraging is seeing them tackling some of the issues that we’ve also been grappling with and that will form part of our discussion. One of these is Best Practice 21: Provide Data Unavailability Reference. This is the ‘Donald Rumsfeld’ problem of data discovery: the unknown unknowns. It can become very frustrating looking for data with no way of knowing whether it is available to be found. Best Practice 21 suggests data publishers should indicate where associated datasets are too personal or sensitive to be published; perhaps - in reality, until we reach the point where all appropriate data is being published - data publishers should also create a full list of what is not currently published.
The question of ‘what is metadata’ has reared its head several times during the course of our research. W3C have not completely defined the exact boundaries of metadata, and this is a question we too are wrestling with. Is 57pp of supporting documentation simply very enthusiastic metadata or an off-putting administrative burden that is onerous for both humans and machines to read? Currently we are referring to it as ‘supporting documentation’ in order to avoid the assumption that we are talking about simpler, more structured metadata such as author, time and place.