This script is for Slackware 14.2 only and may be outdated.

SlackBuilds Repository

14.2 > Libraries > libexttextcat (3.4.5)

Libtextcat is a library with functions that implement the
classification technique described in Cavnar & Trenkle, "N-Gram-Based
Text Categorization". It was primarily developed for language
guessing, a task on which it is known to perform with near-perfect
accuracy.

The central idea of the Cavnar & Trenkle technique is to calculate a
"fingerprint" of a document with an unknown category, and compare this
with the fingerprints of a number of documents of which the categories
are known. The categories of the closest matches are output as the
classification. A fingerprint is a list of the most frequent n-grams
occurring in a document, ordered by frequency. Fingerprints are
compared with a simple out-of-place metric. See the article for more
details.

Considerable effort went into making this implementation fast and
efficient. The language guesser processes over 100 documents/second on
a simple PC, which makes it practical for many uses. It was developed
for use in our webcrawler and search engine software, in which it it
handles millions of documents a day.

Maintained by: Hunter Sezen
Keywords: N-Gram-Based,text,catagorization,language
ChangeLog: libexttextcat

Homepage:
https://wiki.freedesktop.org/www/Software/libexttextcat/

Source Downloads:
libexttextcat-3.4.5.tar.xz (69c984b1785b56942179eb0ddc9c758f)

Download SlackBuild:
libexttextcat.tar.gz
libexttextcat.tar.gz.asc (FAQ)

(the SlackBuild does not include the source)

Individual Files:

• README

• libexttextcat.SlackBuild

• libexttextcat.info

• slack-desc

Validated for Slackware 14.2

See our HOWTO for instructions on how to use the contents of this repository.

Access to the repository is available via:
ftp git cgit http rsync