Data is often said to “speak for itself” and an empirical finding needs to be “supported by data”. But when the data is digital, does it still possess the objective rigour and the uninterpreted directness that these statements suggest?
Last week I had a chat with Stuart Dunn over at King’s College London, where he is part of the Digital Humanities Centre for e-Research. I contacted him to learn more about the kind of digital tools he and his colleagues are using, particularly those related to time and visualisation.
Stuart’s background is in archeology and he entered the digital field through the use of geographic information systems (GIS). Hence, he could tell me a lot about the potential of digital mapping tools and their shortcomings, especially when using systems intended for the ‘exact sciences’ in a context where more fine grained statements than true or false as well as the interpretation of the researcher play a major role in the construction of a narrative. This quickly lead us on to questions about the role of subjectivity in data, how to encode certainty in data and what do we actually mean by the word ‘data’?
This last question should be the first one to be asked whenever we talk about data, as all too often we end up meaning different things or, even worse, we might not even know what it really is we’re talking about. When I talk about data, I usually imply digital data, or more precisely digitally stored data. This necessarily means that data is already structured in some way to fit the confinements of a digital framework.
While this does not yet define data, it distances digital data from data as it can be understood, for example, in Ackoff’s  DIKW model: data, in its original latin meaning, as something given, a fact that may be observed, but is yet uninterpreted and unorganised. In this model, information is something that is derived from data (through the act of interpretation) while digital data can very well be digitally stored information – of course here we would need another discussion on what we mean by ‘information’.
The difference between digital data (data as structure) and data as ‘facts’ may not be obvious when the two appear to be closely correlated, such as recorded readouts from a thermometer. As Stuart points out, in such a case the data structure logically follows from the data inputs, but with the humanities you have to take decisions when creating a database. “Creating a database is an act of interpretation”, he explains.
As someone who sits at the receiving end of such databases I often encounter additional levels of interpretation, namely when the data does not neatly fit the pre-determined structure. In this case the authors often try to describe the data (as facts) in a different way to still be recorded in the data (as structure). While this interpretative step remains visible in the original database format, it often fails to be reproduced in a visualisation which expects the data to be structured in a certain way.
And this is where for me the main problem lies. Not in the fact that (digital) data is always interpreted, filtered and distorted, but in the failure of communicating this process or by mistakenly treating digital data as if it were factual data. Digital data is always made to fit a certain structure that someone somewhen decided to be suitable and while quantitative data visualisations may not be that vulnerable to the consequences, we certainly need to be aware of this when visualising data for the humanities.
|||R Ackoff, From Data to Wisdom, 1989|