What people are saying - Write a reviewUser Review - Flag as inappropriate This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition! Related books
Contents
Other editions - View allCommon terms and phrasesalgorithm arithmetic coding array average binary bitmap bitslices block Boolean queries Burrows-Wheeler transform bytes bzip2 calculated canonical Huffman code Chapter characters coder codeword collection compression methods contains context corresponding cosine counts Data Compression database decoding decompression described disk document image document number encoding example false matches frequency full-text retrieval Gbytes Golomb code hash function Hough transform Huffman code image compression implementation input integer inverted file inverted index inverted list JBIG JBIG2 length lexicon lossless lossy marks Mbytes memory merge Moffat number of bits original image output parameters percent perfect hash function performance pixel pointers probability query terms relevant retrieval system scanned Section shown in Figure signature file sorted space standard stored string symbol Table techniques template temporary file text compression textual image compression TREC unary unary code weights Witten words Zobel References to this bookFrom other books
From Google ScholarThe anatomy of a large-scale hypertextual Web search engineSergey Brin, Lawrence Page - 1998 - Computer Networks and ISDN Systems A Survey of Peer-to-Peer Content Distribution TechnologiesSTEPHANOS ANDROUTSELLIS-THEOTOKIS, DIOMIDIS SPINELLIS - 2004 - ACM Computing Surveys Video Google: A Text Retrieval Approach to Object Matching in VideosJosef Sivic, Andrew Zisserman Searching the WebARVIND ARASU, JUNGHOO CHO, HECTOR GARCIA-MOLINA, ANDREAS PAEPCKE, SRIRAM RAGHAVAN - 2001 - ACM Transactions on Internet Technology References from web pagesManaging Gigabytes—Compressing and Indexing Documents and Images ... Managing Gigabytes: Compressing and Indexing Documents and Images ... citeulike: Managing Gigabytes: Compressing and Indexing Documents ... flazx - Managing Gigabytes: Compressing and Indexing Documents and ... Managing Gigabytes - Second Edition Errata Managing Gigabytes: Compressing and Indexing Documents and Images ... MG: Managing Gigabytes Managing gigabytes (2nd ed.) Lucene NBD GC Prototype » Managing Gigabytes, 2nd Edition - Witten, Moffat, and Bell ... Bibliographic information |