In this extract from a book chapter in the new Routledge Companion to Visual Journalism, I look at how the explosion of data as a source for journalists, and the separation of content from interface in online publishing, have combined to lay the foundations for a range of new storytelling forms, from interactive infographics and timelines to charticles and scrollytelling.
Although the term âdata journalismâ is a relatively recent one, popularised around 2010, data has been part of journalism throughout its history, from early newsletters covering stock prices and shipping schedules in the 17th century, to The Guardianâs 1821 first edition front page table of school spending, US investigations of politiciansâ travel expenses in the 1840s and campaigning factchecking of lynching in the 1890s.
The introduction of computers into the newsroom in the 20th century added a new dimension to the practice. After some early experimentation by CBS News in predicting the outcome of the 1952 presidential election by applying computer power to data, a major breakthrough came in the 1960s with Philip Meyerâs use of databases and social science methods to investigate the causes of riots in Detroit.
Meyer codified his work in the 1973 book Precision Journalism and this (along with new Freedom of Information laws giving journalists the right to request public data) was to exert a major influence on what became known over the following decades as Computer Assisted Reporting (CAR).Â
Computer Assisted Reporting techniques extended the stories that journalists could find in data, but journalists were, on the whole, only responsible for the words: charts and visualisation for the stories would be created by the graphics department.
From CAR to data journalism
In its early pieces of data journalism, The Guardian regularly used Google Fusion Tables to put data points on a map
This would change with the introduction of the World Wide Web and networked news consumption, as would the storytelling options available: the availability of tools for creating charts, maps and infographics from data online multiplied, and their cost dropped to zero. IBMâs Many Eyes (launched in 2007) and Googleâs Fusion Tables (2009) allowed anyone â including journalists â to create and publish their own charts and maps, and played an important role in the development of The Guardianâs data journalism among others.
Both tools would later close, however, leaving gaps in many stories that had used them, and highlighting the issues in relying on third party tools (Broussard 2015; Boss and Broussard 2016).
The most innovative journalists and organisations began exploring interactive formats that invited readers to find their own stories in the data directly, whether through tables or visual devices such as maps or interactive charts.
Some emerging genres became codified in the language and technologies of the industry (Klein-Avraham and Reich 2022): The New York Timesâs longform feature Snow Fall, for example, helped popularise what came to be known as âscrollytellingâ (Dowling and Vogan 2015), while the Financial Timesâs Story Playbook formalised the rules of the âcharticleâ and content creation platforms such as Shorthand and Flourish embedded the practices in their tools.
As publishers hired developers or encouraged their editorial staff to learn how to code, the field also saw cultural and organisational change âwhere the open data philosophy, free software, collaborative and team workâ became part of an emerging identity (Sandoval-MartĂn and La-Rosa 2018 p193).
Defining data journalism
Definitions of what constitutes âdata journalismâ have been somewhat contested, focusing variously on output (i.e. visualisation) (Royal and Blasingame, 2015; Veglis and Bratsas 2017, Zamith 2019); culture (Parasie and Dagiral 2013; Karlsen and Stavelin 2014; Lewis and Usher, 2014; Boyles and Meyer 2016); practice (De Maeyer et al, 2014; Ojo 2018); infrastructure; and input (Tabary et al 2015).
Analysis of discourses around data journalism on Twitter also suggests that âthe publicâs understanding of data-driven journalism ⊠is not refined and clearâ.
Caswell, however, criticises these approaches as being:
ârooted solely in practice, observation, description and categorization ⊠historical perspectives that do not empower journalists or news organizations to participate fully in ongoing technological changeâ.
One recent iteration of that change has been in the increasing role of automation in journalism reflected in the development of ârobot journalismâ (automated articles, often generated to a template created by data journalists) and âstructured journalismâ.
Chris Anderson classifies this new âcomputationalâ journalism as a separate branch to data journalism as part of a three-part typology that situates both in relation to event-based reporting:
The journalism of occurrences: the traditional journalism of episodic, individualised, narrative-format reporting, relying on oral evidence for context;
The journalism of social science: the precision journalism and data journalism of factualising and providing context via statistics;
And computational journalism: these are structured stories that combine the two categories above, representing applications of computational thinking to news reporting (and context is provided via computable âeventsâ)
The rise of data visualisation and interactivity
The Data Journalism Pyramid situates visualisation and interactive features such as utility and personalisation as part of the âcommunicationâ stage of a data journalism project
Interactivity is a complex concept that can be understood as the presence or absence of multiple qualities in a piece of work and the way that it is experienced.
Those qualities include the degree to which the user can influence the story itself; how much they can control the time and place where the story is consumed; and the degree to which they can interact with other parties (such as other users, or journalists) (Liu and Shrum 2002; Downes and McMillan 2000).Â
Interactive news formats have a history that predates the internet age, from âletters to the editorâ and magazine quizzes to radio phone-ins, but online publishing has helped establish a number of further interactive formats that have become generic in the industry.
Some of these â such as online polls and quizzes, interactive maps, charts andtimelines â are remediations of offline equivalents.
Others are web-native innovations: the âimage mapâ, for example, involves taking a static image and adding âhotspotsâ which trigger extra information on a particular area of the image when clicked or rolled over. A âsliderâ allows the user to move a bar in order to swipe between two images (typically a âbeforeâ and âafterâ view of the same scene, such as satellite photos before and after a major disaster). And the âYou Draw Itâ genre of interactive charts pioneered by the New York Times presents the user with a partly-drawn chart while inviting them to draw the rest of the chart to show how much they know (or can guess) about a key statistic.Â
The New York Timesâs âYou Draw Itâ charts invite the user to finish drawing the trend line
Other formats take genres from outside of journalism, such as the simulator, calculator, or game, and apply them within a current affairs context: The Financial Timesâs Uber Game, for example, uses data-driven storytelling to help readers understand the choices facing gig economy drivers: âUber was already using the language and technique of games in its driver app,â wrote the makers of the game.
âBy making a game about it, we would be using the medium as part of the messageâ
In addition, many examples of data-driven journalism use an element of personalisation that involves changing parts of the story in order to make it relevant to the person consuming it: the New York Timesâs âHow Much Hotter Is Your Hometown Than When You Were Born?â, for example, requires the user to enter their hometown and birth year in order to continue reading past the first paragraph: the rest of the article tells a story that uses the readerâs life and personal history as a frame for the subject matter.Â
Four ways that interactivity is incorporated
The nature of interactivity varies between formats. Jens Jensen identifies four ways that such interactivity might be incorporated: transmissional; consultation; conversational; and registrational.
Rollover buttons and map markers are a tool for âtransmissionalâ interactivity
Interactive maps, charts and image maps, for example, employ transmissional interactivity, with the user getting extra information in response to their behaviour (moving their mouse over an area of an image, for example).
The âYou Draw Itâ charts, however, employ conversational interactivity, which involves some input from the user, which is then displayed as part of the story.
And personalised stories which incorporate details about the userâs home town (such as the New York Timesâs hometown example above) would be classified as registrational interactivity.
This story uses registrational interactivity by asking for some information from the user to shape the story that will be told
Finally, interactive story treatments which provide multiple views of the same information â for example, being able to switch from a bar chart to a map or table showing the same data â would be classed as consultational interactivity.
Like many of journalismâs pre-internet forms of interactivity (Schultz 1999), most generic forms of data-driven interactivity are actually largely âreactiveâ in the sense that they provide information in reaction to a userâs activity, rather than interactive in the sense that users can interact with journalists or other readers.
Research into interactive visualisations in Israel notes that interactivityâs
âpotential to efficiently and intricately mediate rich journalistic knowledge [is not realised] in practice [where] they usually mediate and oversimplifyâ.
There are exceptions, however, from The Guardianâs transfer gossip stories where users could at one point see how their fellow readers had voted on the credibility of a particular rumour, to the same publicationâs Flickr group (now dormant) dedicated to readersâ visualisations based on data the organisation had published.
The growing number of GitHub repositories maintained by data journalism teams might also be considered part of the wider âmeso news-spacesâ (Tenenboim and Kligler-Vilenchik 2020) where users can participate in news-related processes, although research suggests participation is low.
Separating the content from the interface
One major change facilitated by the shift to online publishing that had a particular impact on interactivity is highlighted by Lev Manovich:Â
âHistorically, the artist made a unique work within a particular medium. Therefore the interface and the work were the same; in other words, the level of an interface did not exist. With new media, the content of the work and the interface become separate. It is therefore possible to create different interfaces to the same materialâ
Nowhere is this more apparent than in the practice of data visualisation: once a table of data is imported into a chart creation tool, a journalist can instantly preview different interfaces based on that material â a pie chart, bar chart, a simple interactive table, perhaps (if it contains geographical data) a map.
In some cases this choice of interface can be presented directly to the reader: visualisation dashboards, for example, allow readers to apply filters or sorting in order to change the interface to one that is most relevant or meaningful to them.
Another form of database-driven interactive storytelling is the âclickable interactiveâ, where a user is invited to navigate through a storyâs elements by making different navigational choices.
Segel and Heer identify three structures commonly employed in this form, providing differing levels of user control: the âinteractive slideshowâ format, for example, merely invites the reader to click ânextâ through each chapter in the story as they would a slideshow, whereas the âmartini glassâ format allows them to explore the data towards the end of the sequence. In contrast the âdrill-down storyâ gives the reader complete freedom to choose which elements of the story to explore, and their order.Â
Some of these might be considered forms of ergodic storytelling, whereby readers are offered different paths through a story (Aarseth 1997). Ergodic storytelling online typically requires the journalist to convert story material into data stored in a database â in the form of pages in a content management system such as WordPress or updates on a social media platform â which can then be navigated by clicking on hyperlinks that take you from one entry to another.
In the case of a chatbot that database is formally created and navigation takes place through choosing options presented by the chatbot or asking questions which are matched to records in the database (more advanced large language model chatbots might be able to construct more unique answers by accessing and combining billions of text records).Â
Manovich is at pains to point out that simply linking such records together is not enough alone for a story to qualify as an interactive narrative:
âThe author also has to control the semantics of the elements and the logic of their connection so that the resulting object will meet the criteria of narrativeâ
Those criteria â identified by the narratologist Mieke Bal â include, among other things, that a story should consist of âa series of connected events caused or experienced by actorsâ (Bal 1985).
Which chart or map the journalist chooses, then, can be seen as an editorial process guided by the same considerations as other aspects of their reporting.
In the second part of this extract, published tomorrow, I explore the choices involved in that editorial process, along with the ethical considerations and challenges encountered along the way.
