Uni-Logo
You are here: Home Teaching Winter 2015/16 Systems Infrastructure for Data Science
Document Actions

Systems Infrastructure for Data Science

Lecturer

Prof. Dr. Peter Fischer

Organization

 

Lecture:

  • Tuesday, 10:15 - 12, SR SR 00-010/14, Building 101
  • Friday, 14:15-15, SR 00-010/14, Building 101

 

Exercises: Thursday, 15 - 16, 00-010/14, Building 101

Content

 

Recently, the term "big data" has become an important buzzword: Massive amounts of complex data are being produced by businesses, scientific applications, government agencies and social applications. This data can be utilized to gain new insights for decision support, scientific insights, advertising or just entertainment.

In addition to the increasing amount of available data, the architectures and methods to store and analyze this data have changed drastically in the last decade.

The course covers the fundamentals of different data infrastructure
systems, among them classical databases, main-memory databases, data
stream systems and cloud computing frameworks.

In the first part of the course (until around the Christmas break), the fundamentals of database management systems are covered

  1. Architecture of classical Database Management Systems
    1. Data Storage - Storage Hierarchies, Storage Management
    2. Indexing: ISAM, B-Trees, Hash-Based Indexing
    3. Spatial Indexes: Quad Trees, k-d-Trees, R-Trees
    4. Query Processing: Operators, Execution Model
    5. Query Optimizations: Query Translation Stages, Cost Models, Plan Enumeration
    6. Performance Measurement and Tuning
  2. Distributed Databases
    1. General Concepts and Fundamental Architectures
    2. Data Placement and Fragmentation
    3. Distributed Query Processing
  3. Parallel Databases

On this basis, the second stage of the course will provide further insights how design assumptions change when such systems are used in contexts which require extreme scalability, very short response times or complex analytical operations. Relevant Topics include:

  1. Hadoop and the Map-Reduce Framework
  2. Web-Scale "Databases": Key/Value-Stores
  3. Main-Memory Databases
  4. Data Stream Systems
  5. Graph Databases and Graph Computation Systems

 

Course Materials

 

Slides, Annotated Slides and Lecture Recordings are available via ILIAS

The exercise sheets and source files for the exercises will be put on this page during the semester.

  • Exercises are not mandatory, but highly recommended in order to fully understand the lecture and prepare for the exam
  • Exercises will be handed out a week before the due date
  • Solutions are also available via ILIAS. Course membership is sufficient, no special passwords are needed.

Literature


The classical data management areas are covered in the following books:

 

  • Database Management Systems”, Raghu Ramakrishnan and Johannes Gehrke, 3rd edition, 2002
  • Kemper und Eickler. Datenbanksysteme. Eine Einführung. Oldenbourg-Verlag. (in German)

The key reference for distributed databases is

  • Öszu/Valduirez: Principles of Distributed Database Systems, 3rd edition

A fairly new German book also covers many of the course contents:

Erhard Rahm, Gunter Saake, Kai-Uwe Sattler: Verteiltes und Paralleles Datenmanagement (Download inside the university network/VPN)

 

 Modern techniques are mostly available in research papers, which are
provided during the lecture.

« July 2017 »
July
MoTuWeThFrSaSu
12
3456789
10111213141516
17181920212223
24252627282930
31
Personal tools