Text Mining Application Programming teaches software developers how to mine the vast amounts of information available on the Web, internal networks, and desktop files and turn it into usable data. The book helps developers understand the problems associated with managing unstructured text, and explains how to build your own mining tools using standard statistical methods from information theory, artificial intelligence, and operations research. Each of the topics covered are thoroughly explained and then a practical implementation is provided.
The book begins with a brief overview of text data, where it can be found, and the typical search engines and tools used to search and gather this text. It details how to build tools for extracting and using the text, and covers the mathematics behind many of the algorithms used in building these tools. From there you'll learn how to build tokens from text, construct indexes, and detect patterns in text. You'll also find methods to extract the names of people, places, and organizations from an email, a news article, or a Web page. The next portion of the book teaches you how to find information on the Web, the structure of the Web, and how to build spiders to crawl the Web. Text categorization is also described in the context of managing email. The final part of the book covers information monitoring, summarization, and a simple Question & Answer (Q&A) system. The code used in the book is written in Perl, but knowledge of Perl is not necessary to run the software. Developers with an intermediate level of experience with Perl can customize the software. Although the book is about programming, methods are explained with English-like pseudocode and the source code is provided on the CD-ROM.
After reading this book, you'll be ready to tap into the bevy of information available online in ways you never thought possible.
Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.
Manu Konchady (Oakton,VA) is a consultant working on open source text mining software. Previously, he worked at Mitre Corp. where he designed and developed software to mine the Internet. He received his Ph.D. in Information Technology from George Mason University and his articles have appeared in Dr. Dobb's Journal and Linux Journal.
„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.
Anbieter: HPB-Red, Dallas, TX, USA
paperback. Zustand: Good. Connecting readers with great books since 1972! Used textbooks may not include companion materials such as access codes, etc. May have some wear or writing/highlighting. We ship orders daily and Customer Service is our top priority! Bestandsnummer des Verkäufers S_433254037
Anzahl: 1 verfügbar
Anbieter: Better World Books: West, Reno, NV, USA
Zustand: Very Good. 1st Edition. Pages intact with possible writing/highlighting. Binding strong with minor wear. Dust jackets/supplements may not be included. Stock photo provided. Product includes identifying sticker. Better World Books: Buy Books. Do Good. Bestandsnummer des Verkäufers 8187002-75
Anzahl: 1 verfügbar
Anbieter: Better World Books, Mishawaka, IN, USA
Zustand: Good. 1st Edition. Pages intact with minimal writing/highlighting. The binding may be loose and creased. Dust jackets/supplements are not included. Stock photo provided. Product includes identifying sticker. Better World Books: Buy Books. Do Good. Bestandsnummer des Verkäufers 11689868-20
Anzahl: 1 verfügbar
Anbieter: ThriftBooks-Atlanta, AUSTELL, GA, USA
Paperback. Zustand: Very Good. No Jacket. May have limited writing in cover pages. Pages are unmarked. ~ ThriftBooks: Read More, Spend Less. Bestandsnummer des Verkäufers G1584504609I4N00
Anzahl: 1 verfügbar
Anbieter: Better World Books Ltd, Dunfermline, Vereinigtes Königreich
Zustand: Very Good. 1st Edition. Former library copy. Pages intact with possible writing/highlighting. Binding strong with minor wear. Dust jackets/supplements may not be included. Includes library markings. Stock photo provided. Product includes identifying sticker. Better World Books: Buy Books. Do Good. Bestandsnummer des Verkäufers GRP94072333
Anzahl: 2 verfügbar
Anbieter: GridFreed, San Diego, CA, USA
paperback. Zustand: New. In shrink wrap. Looks like an interesting title! Bestandsnummer des Verkäufers 200-08220
Anzahl: 1 verfügbar
Anbieter: medimops, Berlin, Deutschland
Zustand: very good. Gut/Very good: Buch bzw. Schutzumschlag mit wenigen Gebrauchsspuren an Einband, Schutzumschlag oder Seiten. / Describes a book or dust jacket that does show some signs of wear on either the binding, dust jacket or pages. Bestandsnummer des Verkäufers M01584504609-V
Anzahl: 1 verfügbar
Anbieter: BennettBooksLtd, Los Angeles, CA, USA
paperback. Zustand: New. In shrink wrap. Looks like an interesting title! Bestandsnummer des Verkäufers Q-1584504609
Anzahl: 1 verfügbar