When data turns digital

Data is often said to “speak for itself” and an empirical finding needs to be “supported by data”. But when the data is digital, does it still possess the objective rigour and the uninterpreted directness that these statements suggest?

Oil Refinery, Houston, TX

From Peder Norrby’s collection of surreal cityscapes resulting from automated 3D digitising in Apple Maps

Last week I had a chat with Stuart Dunn over at King’s College London, where he is part of the Digital Humanities Centre for e-Research. I contacted him to learn more about the kind of digital tools he and his colleagues are using, particularly those related to time and visualisation.

Stuart’s background is in archeology and he entered the digital field through the use of geographic information systems (GIS). Hence, he could tell me a lot about the potential of digital mapping tools and their shortcomings, especially when using systems intended for the ‘exact sciences’ in a context where more fine grained statements than true or false as well as the interpretation of the researcher play a major role in the construction of a narrative. This quickly lead us on to questions about the role of subjectivity in data, how to encode certainty in data and what do we actually mean by the word ‘data’?

This last question should be the first one to be asked whenever we talk about data, as all too often we end up meaning different things or, even worse, we might not even know what it really is we’re talking about. When I talk about data, I usually imply digital data, or more precisely digitally stored data. This necessarily means that data is already structured in some way to fit the confinements of a digital framework.

While this does not yet define data, it distances digital data from data as it can be understood, for example, in Ackoff’s [1] DIKW model: data, in its original latin meaning, as something given, a fact that may be observed, but is yet uninterpreted and unorganised. In this model, information is something that is derived from data (through the act of interpretation) while digital data can very well be digitally stored information – of course here we would need another discussion on what we mean by ‘information’.

The difference between digital data (data as structure) and data as ‘facts’ may not be obvious when the two appear to be closely correlated, such as recorded readouts from a thermometer. As Stuart points out, in such a case the data structure logically follows from the data inputs, but with the humanities you have to take decisions when creating a database. “Creating a database is an act of interpretation”, he explains.

As someone who sits at the receiving end of such databases I often encounter additional levels of interpretation, namely when the data does not neatly fit the pre-determined structure. In this case the authors often try to describe the data (as facts) in a different way to still be recorded in the data (as structure). While this interpretative step remains visible in the original database format, it often fails to be reproduced in a visualisation which expects the data to be structured in a certain way.

And this is where for me the main problem lies. Not in the fact that (digital) data is always interpreted, filtered and distorted, but in the failure of communicating this process or by mistakenly treating digital data as if it were factual data. Digital data is always made to fit a certain structure that someone somewhen decided to be suitable and while quantitative data visualisations may not be that vulnerable to the consequences, we certainly need to be aware of this when visualising data for the humanities.

[1] R Ackoff, From Data to Wisdom, 1989

Uncertain times need uncertain measures

Data visualisations should represent their underlying data as accurate as possible, and timelines are no exception. However in many cases, temporal data is not accurate in the first place, as it can not easily be measured or counted. In order to represent such uncertain data accurately, we have to allow for ambiguousness in the visual representation of it.

Friedrich Strass’s Strom der Zeiten (1849, left) draws the world’s history as a fluid stream of empires in and out of each other, while Edward Lee’s History’s Largest Empires (2011, right) represents them as solid, discrete entities.

A visualisation should make understandable through an image, what is difficult to grasp in words, and enable new discoveries through visual analysis. Data visualisation in particular enables abstract numbers to be compared visually, or patterns to emerge from what might be just a list of measurements. It is always a translation, a representation of information in a graphical format, which by itself does not contain any new information in the strict sense 1 that was not contained in the raw data already. In the first instance, it makes existing information accessible for visual exploration.

Joseph Priestly stresses this aspect of his work in his Description of a Chart of Biography[1]. The text accompanies what can be considered one of the first graphical timelines (after the pioneering work by Jacques Barbeau-Dubourg): A chart depicting the lives of about two thousand individuals, represented by lines on a linear scale of years ranging from 1200 to 1800.

It is of course an understatement when Priestley declares himself simply “to be an assistant to the great Historians, Chronologers, and Biographers” (p.4), whose work forms the foundation of his timeline. What he means, is that he did not himself produce any new knowledge, but assembled the research of others in one coherent visualisation — laborious and painstakingly of course. In reward for his undertaking, he was now able to see and also show to others, the relation and succession of historic figures, their contemporaries with whom they might have conversed, and the periods when cultural life flourished or, symbolised by emptiness, stalled. The representation of chronological information in graphical form opened it up for visual analysis and exploration, which enables new hypotheses to be developed from which ultimately new knowledge can be gained.

I mention this historic example of a graphical timeline, because it exhibits awareness for key requirements a timeline needs to fulfil, which are still relevant today. Being one of the first to ever create a timeline, Priestley gave careful consideration to all of his design decisions, which he documented in his Description.

Continue reading

[1] J Priestley, A Description of a Chart of Biography, 1764


  1. According to Shannon’s Theory of Information

Manly Images

Exploring Digital Collections

The Manly Local Studies Image Library is a perfect example of how a rich and attractive collection is stowed away behind an impoverished database query form. It’s also a good example to illustrate the fact, that it does not need to be this way. A (prototypical) interface offers a different view on the collection.

Manly Images is, after the Australian Dress Register already the second collection browser I’m presenting that origins from Down Under, and it won’t be the last. I have no clue, where the Australian interest in explorative interfaces comes from, but that is also not what this post is about.


When accessing Manly Images the user is presented with what is in essence a histogram of the collection’s content, although it doesn’t really look like one and doesn’t ask for any knowledge in statistics to be understood. Images are grouped by title or decade and stowed in boxes, that extend in width depending on the amount of images they contain (when viewed by title, the boxes also rescale in height). The arrangement of the boxes is like a flow of words, where wider boxes drop down to the next line if there is not enough horizontal space. Due to the fixed-width layout of the size, the arrangement is always the same as long as the collection doesn’t change. By hacking away the width constraints it is however possible to introduce a certain responsiveness and even arrange all boxes in one horizontal row.

Removing the width constraints on the Manly Images site allows the layout of the image boxes to adapt to the browser size.

One photograph per box serves as poster image and is exchanged continuously. Mitchell Whitelaw, the man behind the project, claims that it is actually possible to see the entire collection “without doing a thing”. How long every image is presented seems to vary — a glimpse at the source code reveals that every three seconds one poster image is changed, which means that in decade view it would take around 18 hours to see the entire collection.

Detail View and Timeline

At this pace most users will prefer to take action by clicking on one of the boxes, which causes another container to slide down below it. Inside, one finds horizontally aligned thumbnails of the images along with their title and year. Two buttons on either edge can be used to scroll through the images. Additionally, a timeline is displayed at the bottom of the container, with blocks representing years. The width of the blocks does not correspond to a duration, but to the amount of images shot in a certain year. Clicking on a year slides the corresponding images into view.

The navigation inside the boxes unfortunately lacks some of the elegance that the main interface does display. Navigating horizontally by clicking buttons might not be appropriate for current users, who are used to endless scrolling and swiping. Also, the animated transition of the images when clicking on a year in the timeline does not properly reflect a journey through time. Instead of scrolling to the selected time, the currently displayed images fade out and new ones slide in from the left or right side (depending if the new images are older or newer than the last viewed ones). Probably, this choice has been made in order to prevent the browser from struggling with moving through thousands of images. But in fact, their presence and motion could be simulated, by scrolling through just a few images.

It is also not possible to compare timelines across decades or subjects. Not only, because the scaling is different in every box — the timeline always occupies the entire horizontal space and the size of the individual years is dependent on the amount of images inside of it. Mainly, because it is not possible to have two boxes open at the same time, a flaw which is inherent in the layout.

Generous Interface

To be fair, Manly Images is still in prototype state and has only just been released to the public as an experiment in, what Whitelaw calls, “generous interfaces”. The generosity even extends beyond it, in the sense that Whitelaw has made the entire source code available on github. So instead of bickering, I might as well get busy with it.

Whitelaw’s earlier work includes the commonsExplorer where his aim was to show everything in a collection. A similar approach was not possible with the Manly collection, as it is of a much larger size and browsers would simply not cope with displaying all the images.

In my opinion, the histogram type of view that might have emerged as a workaround, functions much better in giving both a quantitative and a qualitative view of the image collection, than displaying everything in a two-dimensional grid. I think, that a certain amount of (computational) curating of a collection supports active exploration and discovery and prevents a user from getting overwhelmed by the sheer amount of data.

Some background information on Manly Images can be found on Mitchell’s Blog

Manly Images
Created October 2012
Records ~7000
Searching No
Filtering No
Ordering Title or Decade
Technology HTML5

Australian Dress Register

Exploring Digital Collections

The powerhouse museum in Sydney, is the home of a large collection of artefacts from a wide field spanning design and technology. A relatively small part, namely their garment collection, is accessible via a dedicated website, the Australian Dress Register

Australian Dress Register - Timeline view, looking at dresses


The website offers two entrance points to the dataset: via a browsing interface or via a timeline. As timelines are of particular interest for my research, let’s look at this one first. Time is separated in decades and stretches from left to right, with the years 1860–1974 in view. The most recent “decade” is actually only four years long, probable because the newest item is from 1974. A notification box encourages the user to “click + drag” the timeline to navigate. Two zoom buttons are also present and allow changing the scale of the separations to encompass one year to a century.

Coloured bands meander through the times in a pattern reminiscent of 1980ies logo design. The colours represent different garment categories such as suits, jackets, trousers or dresses. They range from orange via red to green and the visual proximity of the colours makes it very difficult to differentiate between them. Also, the arbitrary vertical stacking of the categories, which causes different categories not to align horizontally makes it hard to compare the amount of items in a particular category across time. The designers actually seem to be aware of these flaws. When hovering over a band, the category name and numbers of items per time section are displayed — a workaround and a pity that the visual representation fails. It might have been a better choice to opt for a tabular view and put each category on its own row.

Zooming in and out also doesn’t really help to make better sense of the data. There is no smooth transition between different zooming steps and clusters form or disperse seemingly at random. A visual indication of the zoom level is also not present and one fails to have a feeling for how “close” one looks at the data without relying on the labeling.


The browsing interface on the other hand provides a rich and engaging interface for exploring the collection. On the left hand side, a panel with filtering options in different categories allows to the user to browse the dataset without specifically looking for something. One can choose to view only trousers that are in good condition, or dresses that have been worn by men (the dataset contains two matching items, a boys dress and a diver’s suit). Within one category of filtering criteria, there are no subcategories. Sometimes this might have been appropriate, such as in “place of origin” or “damage”. On the other hand, the relatively small size of the dataset allows such a flat hierarchy and for many users this might be easier to handle.

A bar at the top offers view options (list or thumbnails) as well as a few sorting options(relevance, alphabetical and most recent). It is not evident what date is considered in the “most recent” ordering. Possibly, it is the date of acquisition as the displayed dates of the garment are not in the correct ordering. Sorting by “relevance” also does not seem to make any sense when browsing, it can however be helpful when searching for a specific item via the search form.

Item View

Where the Australian Dress Register really stands out is in the display of single items. It should not be too much asked to display database records in a thoughtfully formatted way, but this is where most online collections fail. Usually, the full record is just dumped on the screen with every database field formated the same way.

The individual records of the Australian Dress Registers are very well formatted, playing with different font sizes depending on the importance of the information, checkboxes to indicate certain properties and tabular information of the garments measurement — even with the option to convert the metric specifications to inches. Two thumbs up!

Australian Dress Register
Created August 2011
Records 117
Searching Yes
Filtering Yes
Ordering Yes
Technology HTML5


Exploring Digital Collections

The San Francisco Museum of Modern Art has assigned Stamen with the task of developing an interface for exploring their collection of artworks. At the end of 2007, when the project went live, it consisted of 3500 artworks and has since then grown up to 6419 items.

Stamen took a map-like approach, and even built the application on top of their Modest Maps framework. This explains, why the interface coped very well with the almost doubling of items since its introduction. The works of art are however not arranged according to their geographical origin, but in a two dimensional grid. It is not at first evident what the criterium for the particular ordering of the works is. After very close inspection it seems, that the date of acquisition is responsible for the composition. Not necessarily a very informative feature for users outside of SFMOMA.

Navigating the collection also follows the map paradigm. The user is able to pan by dragging the canvas, and zoom by pressing buttons or double clicking in or outside the “lens”. The lens serves as a selection tool. It can be dragged and more detail is revealed about the artwork in its center. Inside the lens, the collection is zoomed in to the next zoom level and the size of the lens can be adjusted by dragging its edges.

The interface also allows searching for title, artists, year or keywords. When a search is successful the first search result is selected. A number indicates the amount of results. It would be helpful to also see the results somehow emphasised within the map. By pressing arrows, the user can navigate to the next result. If several results have been found, the user is taken on a journey across the map, as it smoothly pans under the lens to the next piece of work. Unfortunately, the ordering of the works does not make this journey very meaningful.

ArtScope is quite an attractive interface in its simplicity, and gives an impression of the size and content of SFMOMA’s collection. In this sense, it is more of a visualisation than a real exploration tool. Better search and filtering tools, as well as sorting options would greatly improve the overal use for the general public as well as curators.

More technical information on ArtScope can be found on Stamen design

Created November 2007
Records 6419
Searching Yes
Filtering No
Ordering No
Technology Flash