What you will study
This module will provide you with a broad overview of the concepts, techniques and tools of modern data management and analysis. It will compare traditional relational databases with an alternative model (a NoSQL database), and will help you learn how to choose the most appropriate means of storing and managing data, depending on the size and structure of a particular dataset and its intended use. You will be introduced to preliminary techniques in data analysis, starting from the position that data is used to answer a question, and introduced to a range of data visualisation and analysis techniques that will instil an understanding of how to start exploring a new data set.
To ensure that you are comfortable with handling datasets, you will explore a range of real-world datasets to illustrate the key concepts in the module. Sources such as data.gov.uk, the World Bank, and a range of other national and international agencies may be used to provide appropriate data. You will spend approximately equal time between issues in data management (technical and socio-legal issues in storing and maintaining datasets), and issues in data analytics (understanding how data can be used to answer questions).
The module is framed around a narrative that looks at how to manage and extract value and insight from a range of increasingly large data collections. At each stage, a comparison will be drawn between different ways of representing the data (for example, using different sorts of charts or geographical mapping techniques), and limitations of the mechanisms presented. To enable you to get a feel for the use of data, each stage will also include an overview of some data analysis techniques, including summary reporting and exploratory data visualisation. This module is driven by Richard Hamming’s famous quote: ‘The purpose of computing is insight, not numbers’.
Some of the key ideas are:
Introducing data analysis
Starting with a data file such as a spreadsheet, this unit will provide you with a brief introduction to some basic operations on simple data files. This will give you an opportunity to study an outline of the key ideas in the module and help you become familiar with the module software.
Concepts in data management
You will look at three key areas in data management: data architectures and data access (CRUD), data integrity, and transaction management (ACID). Each of these topics will be illustrated using a relational database, and one non-relational alternative. The advantages and limitations of each model are discussed.
Legal and ethical issues
Here you will consider the legal and ethical issues involved in managing data collections. You will be required to obtain and read (parts of) the Data Protection Act and the Freedom of Information Act, and demonstrate how these apply to issues in data management. You will also consider privacy, ownership, intellectual property and licensing issues in data collection, management, retrieval and reuse.
Concepts in data analytics
These sections will focus on using data to answer a real question; the focus will be on exploratory techniques (such as visualisation) and formulating a question into a form that can be answered realistically using the data that is available. Issues in processing techniques for large and real-time streamed data collections will also be addressed along with techniques and technologies (such as MapReduce) for handling them. In this part of the module you will use a statistical package such as the python scientific libraries and/or ggplot2 to visualise the data and carry out appropriate analyses.
If you are considering progressing to The computing and IT project (TM470), this is one of the OU level 3 modules on which you could base your project topic. Normally, you should have completed one of these OU level 3 modules (or be currently studying one) before registering for the project module.