The Tate Collection on GitHub

OpenGLAM alerted me via Twitter that the entire digital collection of the Tate is available on GitHub. I haven’t heard of any other institution who makes their collection available through this platform. It does kind of give a lot away, but then again, that’s the whole point of open data.

Why not Linked Data, asks a fellow Tweeter and Tate’s web architect Rich Barrett-Small justifies their move to GitHub with it being the most time- and cost-effective solution to get the data out there – for now.

Yes, SPARQL endpoints are the weapon of choice these days, but what’s wrong with using GitHub? It’s an incredibly versatile platform by far not limited to programmers, but equally useful for thesis writing or democracy.

What’s great about using GitHub, as opposed to APIs, is that it doesn’t only give you access to the data, it gives you the data. Maybe I’m old school, but I do like having real files on my hard drive, as opposed to it all being locked off in a cloud. And it’s still possible to keep the data updated, by syncing it with the original repository.

But enough about pros and cons of technical details, let’s have a look at what Tate offers.

Continue reading

When data turns digital

Data is often said to “speak for itself” and an empirical finding needs to be “supported by data”. But when the data is digital, does it still possess the objective rigour and the uninterpreted directness that these statements suggest?

Oil Refinery, Houston, TX

From Peder Norrby’s collection of surreal cityscapes resulting from automated 3D digitising in Apple Maps

Last week I had a chat with Stuart Dunn over at King’s College London, where he is part of the Digital Humanities Centre for e-Research. I contacted him to learn more about the kind of digital tools he and his colleagues are using, particularly those related to time and visualisation.

Stuart’s background is in archeology and he entered the digital field through the use of geographic information systems (GIS). Hence, he could tell me a lot about the potential of digital mapping tools and their shortcomings, especially when using systems intended for the ‘exact sciences’ in a context where more fine grained statements than true or false as well as the interpretation of the researcher play a major role in the construction of a narrative. This quickly lead us on to questions about the role of subjectivity in data, how to encode certainty in data and what do we actually mean by the word ‘data’?

This last question should be the first one to be asked whenever we talk about data, as all too often we end up meaning different things or, even worse, we might not even know what it really is we’re talking about. When I talk about data, I usually imply digital data, or more precisely digitally stored data. This necessarily means that data is already structured in some way to fit the confinements of a digital framework.

While this does not yet define data, it distances digital data from data as it can be understood, for example, in Ackoff’s [1] DIKW model: data, in its original latin meaning, as something given, a fact that may be observed, but is yet uninterpreted and unorganised. In this model, information is something that is derived from data (through the act of interpretation) while digital data can very well be digitally stored information – of course here we would need another discussion on what we mean by ‘information’.

The difference between digital data (data as structure) and data as ‘facts’ may not be obvious when the two appear to be closely correlated, such as recorded readouts from a thermometer. As Stuart points out, in such a case the data structure logically follows from the data inputs, but with the humanities you have to take decisions when creating a database. “Creating a database is an act of interpretation”, he explains.

As someone who sits at the receiving end of such databases I often encounter additional levels of interpretation, namely when the data does not neatly fit the pre-determined structure. In this case the authors often try to describe the data (as facts) in a different way to still be recorded in the data (as structure). While this interpretative step remains visible in the original database format, it often fails to be reproduced in a visualisation which expects the data to be structured in a certain way.

And this is where for me the main problem lies. Not in the fact that (digital) data is always interpreted, filtered and distorted, but in the failure of communicating this process or by mistakenly treating digital data as if it were factual data. Digital data is always made to fit a certain structure that someone somewhen decided to be suitable and while quantitative data visualisations may not be that vulnerable to the consequences, we certainly need to be aware of this when visualising data for the humanities.

[1] R Ackoff, From Data to Wisdom, 1989