Information Retrieval

Author: Stefan Büttcher
Publisher: MIT Press
ISBN: 0262528878
Format: PDF
Download Now
An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation.

Information Retrieval

Author: Stefan Büttcher
Publisher: MIT Press
ISBN: 0262026511
Format: PDF, Docs
Download Now
An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation.

Information Retrieval

Author: Stefan Büttcher
Publisher: MIT Press
ISBN: 0262288680
Format: PDF, ePub, Docs
Download Now
Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus -- a multiuser open-source information retrieval system developed by one of the authors and available online -- provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

Introduction to Information Retrieval

Author: Christopher D. Manning
Publisher: Cambridge University Press
ISBN: 1139472100
Format: PDF, Docs
Download Now
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Search Engines

Author: Bruce Croft
Publisher: Pearson Higher Ed
ISBN: 0133001598
Format: PDF
Download Now
This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book. Search Engines: Information Retrieval in Practice is ideal for introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments. It is also a valuable tool for search engine and information retrieval professionals. Written by a leader in the field of information retrieval, Search Engines: Information Retrieval in Practice , is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. Coverage of the underlying IR and mathematical models reinforce key concepts. The book’s numerous programming exercises make extensive use of Galago, a Java-based open source search engine.

Scalability Challenges in Web Search Engines

Author: B. Barla Cambazoglu
Publisher: Morgan & Claypool Publishers
ISBN: 1627058133
Format: PDF, ePub, Docs
Download Now
In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.

Modern Information Retrieval

Author: Ricardo Baeza-Yates
Publisher: Addison-Wesley Professional
ISBN: 9780321416919
Format: PDF, ePub, Docs
Download Now
This is a rigorous and complete textbook for a first course on information retrieval from the computer science perspective. It provides an up-to-date student oriented treatment of information retrieval including extensive coverage of new topics such as web retrieval, web crawling, open source search engines and user interfaces. From parsing to indexing, clustering to classification, retrieval to ranking, and user feedback to retrieval evaluation, all of the most important concepts are carefully introduced and exemplified. The contents and structure of the book have been carefully designed by the two main authors, with individual contributions coming from leading international authorities in the field, including Yoelle Maarek, Senior Director of Yahoo! Research Israel; Dulce Poncele´on IBM Research; and Malcolm Slaney, Yahoo Research USA. This completely reorganized, revised and enlarged second edition of Modern Information Retrieval contains many new chapters and double the number of pages and bibliographic references of the first edition, and a companion website www.mir2ed.org with teaching material. It will prove invaluable to students, professors, researchers, practitioners, and scholars of this fascinating field of information retrieval.

Managing Gigabytes

Author: Ian H. Witten
Publisher: Morgan Kaufmann
ISBN: 9781558605701
Format: PDF, Kindle
Download Now
In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web. * Up-to-date coverage of new text compression algorithms such as block sorting, approximate arithmetic coding, and fat Huffman coding * New sections on content-based index compression and distributed querying, with 2 new data structures for fast indexing * New coverage of image coding, including descriptions of de facto standards in use on the Web (GIF and PNG), information on CALIC, the new proposed JPEG Lossless standard, and JBIG2 * New information on the Internet and WWW, digital libraries, web search engines, and agent-based retrieval * Accompanied by a public domain system called MG which is a fully worked-out operational example of the advanced techniques developed and explained in the book * New appendix on an existing digital library system that uses the MG software

Google s PageRank and Beyond

Author: Amy N. Langville
Publisher: Princeton University Press
ISBN: 140083032X
Format: PDF, ePub, Mobi
Download Now
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review

Distributed Hash Table

Author: Hao Zhang
Publisher: Springer Science & Business Media
ISBN: 1461490081
Format: PDF, Mobi
Download Now
This SpringerBrief summarizes the development of Distributed Hash Table in both academic and industrial fields. It covers the main theory, platforms and applications of this key part in distributed systems and applications, especially in large-scale distributed environments. The authors teach the principles of several popular DHT platforms that can solve practical problems such as load balance, multiple replicas, consistency and latency. They also propose DHT-based applications including multicast, anycast, distributed file systems, search, storage, content delivery network, file sharing and communication. These platforms and applications are used in both academic and commercials fields, making Distributed Hash Table a valuable resource for researchers and industry professionals.