Publication in the Diário da República: Plano 4 - 2010/2011
5 ECTS; 2º Ano, 2º Semestre, 30,0 PL + 30,0 TP , Cód. 925044.
Lecturer
- Ricardo Nuno Taborda Campos (2)
(1) Docente Responsável
(2) Docente que lecciona
Prerequisites
C# Computer Skills
The Courses "Programming and Algorithms" and "Programming Languages" (recommended).
Objectives
Students should be able to design the data structure of a search engine, explore crawling tools, understand the different stages of natural language processing, implement an inverted index as well as data search models and Cranfield assessment.
Program
1. Informantion Retrieval and Search Engines
1.1. Objectives
1.2. Search Engines
1.3. Aplicattions
1.4. Difficulties and Challenges
1.5. IR architecture
2. Crawling
2.1. Definition
2.2. Performance
2.3. Implementation
3. Text Processing
3.1. Sentence splitting
3.2. Tokenization
3.3. Part-of-speech tagging
3.4. Named entity recognition
3.5. Stopwords
3.6. Stemming
4. Text representation
4.1. Types of evidence
4.2. Bag-of-words
5. Indexing
5.1. Inverted Files
5.2. Posting Lists
6. IR Models
6.1. Boolean
6.2. Vector Space Model
6.3. Other models
7. Evaluation
7.1. Relevance
7.2. Methods(Lab, user-centered, online)
7.3. Cranfield
7.4. Metrics
7.5. Tests
Evaluation Methodology
- Midterm assessment: Midterm test (60%) + project I (40%)
- Final assessment: (first attempt or resit): 100%
Bibliography
- Croft, B. e Metzler, D. e Strohman, T. (0). Search Engines: Information Retrieval in Practice. Acedido em 24 de novembro de 2015 em http://ciir.cs.umass.edu/irbook/
- Liu, B. (2007). Web Data Mining. Ams: Springer
- Manning, C. e Raghavan, P. e Schütze, H. (0). An Introduction to Information Retrieval. Acedido em 24 de novembro de 2015 em http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf
- Van Rijsbergen, C. (0). Information Retrieval. Acedido em 24 de novembro de 2015 em Information Retrieval
Teaching Method
Theoretical-practical sessions: Presentation of the topics under study using expository and demonstrative methods Practical sessions: Analysis and resolution of case studies.
Software used in class
Microsoft Visual Studio