Publication in the Diário da República: Despacho n.º 7043/2016 - 27/05/2016
7.5 ECTS; 1º Ano, 1º Semestre, 30,0 PL + 30,0 TP + 15,0 OT + 10,0 O , Cód. 39091.
Lecturer
(1) Docente Responsável
(2) Docente que lecciona
Prerequisites
Not applicable
Objectives
1. Get familiarized with the 5 V?s of big data;
2. Understand the risks of using big data in what concerns to data privacy
3. Understand the lifecycle of a big data project and its architecture
4. Get to know query, storage and distributed systems behind big data
5. Know how to extract information
Program
1. Introduction to Data Science
- What is Data Science?
- Data Analysis, Data Analytics, Big Data
- Skills to become a Data Scientist
- Data Science Lifecycle
2. Ethics and Data Privacy
- How can we avoid big data?
- Identity;
- Privacy;
- Ethics;
- Ownership;
- Reputation;
3. Introduction to Big Data
- What is big data?
- Who is using Big Data?
- Where is this data coming from?
- Why are they collecting this data?
- How does big data differs from traditional databases?
- Different types of data
- Get familiarized with the 5 V?s of Big Data: volume, velocity, variety, veracity and value;
4. Big Data Storage and Processing Framework: Apache Hadoop e Spark
- HDFS;
- MapReduce;
- RDDs
- Dataframes
- Streaming
5. Text Analytics
- What is Text Analytics?
- Applications;
- Natural Language Processing (NLP) Arquitecture;
- NLP commercial solutions;
- Text Analytics with Python
Evaluation Methodology
Periodic Assessment: Research Project (RP) (50%)+Hands-on Lab(50%)
Students are excluded from the exam if they score < 4 points in either of the 2 assessment moments or if they do not reach a minimum of 70% of attendance.
Final Evaluation: RP(100%)
Bibliography
- Davis, K. (2012). Ethics of Big Data. (pp. 1-79). USA: O´Reilly
- Erl, T. e Khattak, W. e Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. (pp. 1-235). USA: Prentice Hall
- Provost, F. e Fawcett, T. e , . (2013). Data Science for Business. (pp. 1-386). USA: O´Reilly
- Witten, I. e Frank, E. e Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. (pp. 1-629). USA: Elsevier
Teaching Method
Theoretical and practical teaching with audiovisual media, laboratory equipment and practical examples. Assessement: Realization and presentation of group projects.
Software used in class
Apache Hadoop; Spark; Python: Anaconda e Jupyter Notebooks