IPT Logotipo do IPT

Ano Letivo: 2016/17

Engenharia Informática-Internet das Coisas

Analysis and Processing of Big Data

<< back to Curriculum Plan

Publication in the Diário da República: Despacho n.º 7043/2016 - 27/05/2016

7.5 ECTS; 1º Ano, 1º Semestre, 30,0 PL + 30,0 TP + 15,0 OT + 10,0 O , Cód. 39091.

Lecturer
- Ricardo Nuno Taborda Campos (2)

(1) Docente Responsável
(2) Docente que lecciona

Prerequisites
Not applicable

Objectives
1. Get familiarized with the 5 V’s of big data;
2. Understand the risks of using big data in what concerns to data privacy
3. Understand the lifecycle of a big data project and its architecture
4. Get to know query, storage and distributed systems behind big data
5. Know how to extract information

Program
1. Introduction to Big Data
- What is big data?
- Who is using Big Data?
- Where is this data coming from?
- Why are they collecting this data?
- How does big data differs from traditional databases?
- Different types of data
- Get familiarized with the 5 V’s of Big Data: volume, velocity, variety, veracity and value;

2. Ethics and Data Privacy
- How can we avoid big data?
- Identity;
- Privacy;
- Ethics;
- Ownership;
- Reputation;

3. Big Data Lifecycle
- Business Case Evaluation;
- Data Identification;
- Data Acquisition & Filtering;
- Data Extraction;
- Data Validation & Cleansing;
- Data Aggregation & Representation;
- Data Analysis;
- Data Visualization;
- Utilization of Analysis Results

4. Big Data Storage: NoSQL
- Key Value Pairs;
- Column-based;
- Document-based;
- Graph-based;

5. Big Data Storage and Processing Framework: Apache Hadoop
- HDFS;
- MapReduce;

6. Big Data Analytics
- Slicing and dicing;
- Basic monitoring;
- Anomaly identification;
- Data Mining;
- Text Mining;
- Web Mining;
- Multimedia Mining.

7. Text Mining
- Difference between Text Analytics and Search;
- Extraction Techniques;
- Text Processing Architecture;

8. Implementation of practical solutions for Big Data
- Installation, configuration and use of an Hadoop distribution;


Evaluation Methodology
Realization of 2 projects.
Proj I: 60%
Proj II: 40%

The project is required to obtain approval. In case of non-delivery, students are automatically reproved getting unable to propose to exam

Bibliography
- Davis, K. (2012). Ethics of Big Data. (pp. 1-79). USA: O´Reilly
- Erl, T. e Khattak, W. e Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. (pp. 1-235). USA: Prentice Hall
- Provost, F. e Fawcett, T. e , . (2013). Data Science for Business. (pp. 1-386). USA: O´Reilly
- Witten, I. e Frank, E. e Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. (pp. 1-629). USA: Elsevier

Teaching Method
Theoretical and practical teaching with audiovisual media, laboratory equipment and practical examples. Assessement: Realization and presentation of group projects.

Software used in class
Apache Hadoop

 

 

 


<< back to Curriculum Plan
Elemento gráfico

News | Agenda

NP4552
Financiamento
KreativEu
erasmus
catedra
b-on
portugal2020
centro2020
compete2020
crusoe
fct
feder
fse
poch
portugal2030
poseur
prr
santander
republica
UE next generation
Centro 2030
Lisboa 2020
Compete 2030
co-financiado