MoMA on GitHub

The Museum of Modern Art has followed in the footsteps of Tate and Cooper Hewitt and published their collections data on GitHub.


As I’m currently in the final phase of my PhD, I have to dedicate more time to writing and less to doing. Even so I can’t let MoMA’s datasets go by unnoticed.

The above screenshot is from a timeline tool I developed for visually analysing large cultural collections. I imported the MoMA dataset and visualised the object records along their production dates. We can see the timeframe the collection spans, with earliest pieces from the late 1700s and – obviously – a focus on twentieth century and contemporary items.


The block shape around 1820 and the rectangular spike at 1900 represent large numbers of items that have the same, or very similar, production dates. Such anomalies can stand for series of items in the collection, they can be traces of curatorial decisions in cataloguing, they could be mistakes in dating, etc.

I inspected a few records in the 1900 spike and encountered a few photographs, which gave me the idea that the spike could represent a larger series of photographs – this would explain the high production output in a short timeframe. The tool allows me to colour records according to a field value, so I gave it a try and coloured all photographs in green:

Continue reading

Bye Bye Beautiful Data

Last week the Beautiful Data workshop organised by metaLab (at) Harvard university came to an end. Twelve intense days filled with talks, discussions, hands-on workshops and visits to local museums as well as, of course, local bars.


Matthew Battles aggregating the participants question on an interactive blackboard.

The highlight, for me, were the people that gathered in the well air-conditioned, dimly lit premises of Harvard’s project space arts@29garden. For a workshop of this nature you can have the best possible organisation – and boy were the metaLab folks organised – what makes it in the end a success or not is finding the right mixture of participants, speakers as well as staff. The “diverse, elite group of curators, scholars, and technologists”, as the programme described the invited participants, turned out to be a very open, social and intellectually stimulating bunch from mostly the United States, but also Mexico and central Europe.

An impressive line-up of guest speakers accompanied us throughout the two weeks. Seb Chan’s talk left a lasting impression on me and, I think, on most participants. He presented his work at the Cooper Hewitt museum on making their online collection accessible and usable on the web. It all seemed so simple when Seb walked us through the features of their new website, but the takeaway point of it all was the need to let go of perfection. Data will never be perfect, collections data certainly not; the point is to get it out there anyway and to do something with it. Museums and institutions need to partly let go of their authority and expose their imperfections, so that the public can understand and, if necessary, help out. What became evident throughout the two weeks is that collections holders themselves don’t necessarily know much about their own data and that by opening it up they could learn a lot about their own history.

David Weinberger similarly pointed out the changing nature of authority. Our knowledge has, for centuries, been shaped by books and papers: truth is what it says in the book. Today, Wikipedia took on the position of the Encyclopedia Britannica in being the turn to place for finding “truth”. Part of Wikipedia’s success, Weinbergar says, is its ability to acknowledge its own fallibility. Institutions who want to remain credible, need to begin communicating their imperfection. It can start as simple as with changing the wording on a website, like Cooper-Hewitt’s collection being “pretty confident” about knowing something, rather than pretending something to be absolutely certain.

Of course, we participants were also invited to share our insights in ‘lighting talks’ that took anywhere between 5 to 15 minutes. Personally, I enjoyed these peer presentations the most. We had Pietro Santachiara talking about the “Tourist in Rome Syndrome”; the belief that everything is important and the importance of leaving things away. Gudrun Buehl told us about their ‘aztec style’ – or forged Aztecan? – birthing statue and how it made its way into Indiana Jones. Rich Barrett-Small from the Tate presented what they got out of making their dataset available on GitHub and copyright lawyer Katherine DeVos Devine dismantled the photographing policies of the Tate and other participant’s institutions on the spot.

I can’t mention all the highlights. If I could I would write more about the work done by the metaLab folk themselves: Jeffrey Schnapp’s presentation on Curarium, Jeff Steward’s visualisation of movements and interactions within the Harvard Art Museum collection and Yanni Loukissas’ time-wise visualisations of the Harvard Arboretum dataset.

I would also write more about the project work undertaken by the participants in the second week and the amazing outcomes, which I hope will be made accessible in some form very soon. Rich’s Colour Lens, a colour based cross-collection browser is already online. Steven Lubar outlines his impression of the workshop on his own blog and the project essay by Kristina Van Dyke and Steven Lubar is accessible as well. Cristoforo Magliozzi from metaLab was instrumental in producing videos together with the participants, such as Gudrun and Pietro’s Memorable Encounter and Lanfranco Aceti and Vincent Brown’s Border Cuts.

I’ll update this post with links to further projects once they become available. For now, I end with a big Thank You to everyone at metaLab, all the participants and speakers, and last but not least the Getty Foundation for their generous support.

IMG_1470 IMG_0225_DSF1586 _DSF1576

Beautiful Data at Harvard University

Beautiful Data, a summer institute for telling stories with open art collections brings together museum professionals, scholars and technologist to work on new ways of making use of the growing amount of digital collections data that is becoming accessible

Arts at garden

Image: The workshop will take place at Arts @ 29 Garden, the creative project space of Harvard University

During the next two weeks, I will take part in this summer institute organised by Harvard’s metaLAB and funded by The Getty Foundation. I have been invited as one of 22 professionals and academics in the field of museums, archives and digital humanities to work on concepts and practical solutions for new ways of art-historical storytelling using open digital collections and to critically discuss the ethical, curatorial and intellectual challenges of digital media in a cultural context.

I’m not sure how much I’m allowed to give away of the programme here, but I’m impressed by the number of high-profile speakers and participants the organisers managed to gather for, what can only be, an exciting, stimulating and challenging workshop. Watch this space for updates and outcomes.

Challenges for Time as Digital Data

I have recently been invited to present my research at the Herrenhausen Conference on Digital Humanities. The Volkswagen Foundation, who organised the event, offered travel grants for young researchers to present their research topic in a short talk and a poster. Instead of presenting my research as a whole (which we PhD students have to do over and over again), I chose to talk only about an aspect of it: the problem of representing time digitally.


Read on for the paper on which my talk was based. I presented it, along with this poster, at the Herrenhausen Conference: “(Digital) Humanities Revisited — Challenges and Opportunitiesin the Digital Age” at the Herrenhausen Palace, Hanover/Germany, December 5-7, 2013.

In digital humanities, there usually is a gap between digitally stored data and the collected data. The gap is due to the (structural) changes that data needs to undergo in order to be stored within a digital structure. This gap may be small, in the case of natively digital data such as a message on Twitter: a tweet can be stored close to its ‘original’ format, but it still looses a lot of its frame of reference (the potential audience at a specific point in time, the actual audience, the potential triggers of the message etc.). In digital humanities this distance may become so large that some researchers argue, the term data should be largely abandoned and replaced with capta. Capta, the taken, in contrast to data, the given, should emphasise the interpretative and observer dependent nature of data in the humanities [1]. This problem relates to all kinds of data, whether categorical, quantitative, spatial or temporal. I will however focus only on the last type. Time and temporal expressions are particularly susceptible to multiple modes of interpretation and (unintended) modifications due to limitations in digital data structures, but also due to the ambiguous and subjective nature of time itself.

Continue reading

[1] Johanna Drucker, Humanities approaches to graphical display, 2011