SlackBuilds Repository

14.2 > Libraries > libexttextcat (3.4.4)

Libtextcat is a library with functions that implement the
classification technique described in Cavnar & Trenkle, "N-Gram-Based
Text Categorization". It was primarily developed for language
guessing, a task on which it is known to perform with near-perfect
accuracy.

The central idea of the Cavnar & Trenkle technique is to calculate a
"fingerprint" of a document with an unknown category, and compare this
with the fingerprints of a number of documents of which the categories
are known. The categories of the closest matches are output as the
classification. A fingerprint is a list of the most frequent n-grams
occurring in a document, ordered by frequency. Fingerprints are
compared with a simple out-of-place metric. See the article for more
details.

Considerable effort went into making this implementation fast and
efficient. The language guesser processes over 100 documents/second on
a simple PC, which makes it practical for many uses. It was developed
for use in our webcrawler and search engine software, in which it it
handles millions of documents a day.

Maintained by: Hunter Sezen
Keywords: N-Gram-Based,text,catagorization,language

Homepage:
https://wiki.freedesktop.org/www/Software/libexttextcat/

Source Downloads:
libexttextcat-3.4.4.tar.xz (bfa7107c27afda3a3afa4b7ab5a3fe17)

Download SlackBuild:
libexttextcat.tar.gz
libexttextcat.tar.gz.asc (FAQ)

(the SlackBuild does not include the source)

Validated for Slackware 14.2

See our HOWTO for instructions on how to use the contents of this repository.

Access to the repository is available via:
ftp git cgit http rsync

© 2006-2017 SlackBuilds.org Project. All rights reserved.
Slackware® is a registered trademark of Patrick Volkerding
Linux® is a registered trademark of Linus Torvalds
Web Design by WebSight Designs |  Managed Hosting by OnyxLight Communications