Introduction to Information Retrieval
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
- Hardback | 506 pages
- 177.8 x 256.54 x 30.48mm | 1,020.58g
- 08 Oct 2008
- CAMBRIDGE UNIVERSITY PRESS
- Cambridge, United Kingdom
- 5 b/w illus. 47 tables 263 exercises
'This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes.' Peter Norvig, Director of Research, Google Inc. '... this book sets a high standard ...' Natural Language Engineering 'Introduction to Information Retrieval is a comprehensive, authoritative, and well-written overview of the main topics in IR. The book offers a good balance of theory and practice, and is an excellent self-contained introductory text for those new to IR.' Computational Linguistics 'This book provides what Salton and Van Rijsbergen both failed to achieve ... Even more important, unlike some other books in IR, the authors appear to care about making the theory as accessible as possible to the reader, on occasion including short primers to certain topics or choosing to explain difficult concepts using simplified approaches. ... its coverage [is] excellent, the quality of writing high and I was surprised how much I learned from reading it. I think the online resources are impressive.' Natural Language Engineering
About Christopher D. Manning
Christopher Manning is an Associate Professor of Computer Science and Linguistics at Stanford University. His research concentrates on probabilistic models of language and statistical natural language processing, information extraction, text understanding and text mining. Dr Prabhakar Raghavan is Head of Yahoo! Research and a Consulting Professor of Computer Science at Stanford University. Dr Hinrich Schutze resides as Chair of Theoretical Computational Linguistics at the Institute for Natural Language Processing, University of Stuttgart.
Table of contents
1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis.