Open Data – An Introduction

What is Open Data?

“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” OpenDefinition.org

The Open Definition sets out in detail the requirements for ‘openness’ in relation to content and data. The key features are:

Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.

Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable.

Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Read the full Open Definition

Open Data: How We Got Here, and Where We’re Going.
From the LIFT 2012 conference.

What Kinds of Open Data?

Types of Data Geodata Culture Science Financial Statistics Climate Environment Transport

Geodata

The data that is used to make maps — from the location of roads and buildings to topography and boundaries.

Cultural

Data about cultural works and artefacts – for example titles and authors – and generally collected and held by galleries, libraries, archives and museums.

Science

Data that is produced as part of scientific research from astronomy to zoology.

Finance

Data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc).

Statistics

Data produced by statistical offices such as the census and key socioeconomic indicators.

Weather

The many types of information used to understand and predict the weather and climate.

Environment

Information related to the natural environment such presence and level of pollutants, the quality and rivers and seas.

Transport

Data such as timetables, routes, on-time statistics.

Why Open Data?

Why should data be open? The answer, of course, depends somewhat on the type of data. However, there are common reasons such as:

Transparency. In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized and this requires that the material be open so that it can be freely used and reused.

Releasing social and commercial value. In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.

Participation and engagement – participatory governance or for business and organizations engaging with your users and audience. Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it.

How to Open Up Data

If you are looking for practical, more detailed, advice on how to open up data, have a look at the Open Data Handbook. The handbook discusses the legal, social and technical aspects of how to open up data. Read more in the Open Data Handbook. Here we provide some short suggestions for initial steps.

3 Key Rules

There are three key rules we recommend following when opening up data:

  • Keep it simple. Start out small, simple and fast. There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine — of course, the more datasets you can open up the better.

    Remember this is about innovation. Moving as rapidly as possible is good because it means you can build momentum and learn from experience — innovation is as much about failure as success and not every dataset will be useful.

  • Engage early and engage often. Engage with actual and potential users and re-users of the data as early and as often as you can, be they citizens, businesses or developers. This will ensure that the next iteration of your service is as relevant as it can be.

    It is essential to bear in mind that much of the data will not reach ultimate users directly, but rather via ‘info-mediaries’. These are the people who take the data and transform or remix it to be presented. For example, most of us don’t want or need a large database of GPS coordinates, we would much prefer a map. Thus, engage with infomediaries first. They will re-use and repurpose the material.

  • Address common fears and misunderstandings. This is especially important if you are working with or within large institutions such as government. When opening up data you will encounter plenty of questions and fears. It is important to (a) identify the most important ones and (b) address them at as early a stage as possible.

The Four Steps

These are in very approximate order – many of the steps can be done simultaneously.

  1. Choose your dataset(s). Choose the dataset(s) you plan to make open. Keep in mind that you can (and may need to) return to this step if you encounter problems at a later stage.
  2. Apply an open license.
    1. Determine what intellectual property rights exist in the data.
    2. Apply a suitable ‘open’ license that licenses all of these rights and supports the definition of openness discussed in the section above on ‘What Open Data’
    3. NB: if you can’t do this go back to step 1 and try a different dataset.
  3. Make the data available – in bulk and in a useful format. You may also wish to consider alternative ways of making it available such as via an API.
  4. Make it discoverable – post on the web and perhaps organize a central catalog to list your open datasets.