SlackBuilds Repository

14.2 > Python > python-pdfminer (20140328)

PDFMiner is a tool for extracting information from PDF documents. Unlike
other PDF-related tools, it focuses entirely on getting and analyzing
text data. PDFMiner allows one to obtain the exact location of text in a
page, as well as other information such as fonts or lines. It includes a
PDF converter that can transform PDF files into other text formats (such
as HTML). It has an extensible PDF parser that can be used for other
purposes than text analysis.

PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py.

pdf2txt.py

pdf2txt.py extracts text contents from a PDF file. It cannot recognize
text drawn as images. It also extracts locations, font names/sizes,
writing direction. It requires a password for password protected PDF
documents. You cannot extract any text from a PDF document which does
not have extraction permission.

dumppdf.py

dumppdf.py dumps the internal contents of a PDF file in pseudo-XML
format. This program is primarily for debugging purposes, but it's also
possible to extract some meaningful contents (e.g. images).

Maintained by: Brenton Earl
Keywords: pdf,parse,analyze,extract,dump
ChangeLog: python-pdfminer

Homepage:
https://euske.github.io/pdfminer/index.html

Source Downloads:
pdfminer-20140328.tar.gz (dfe3eb1b7b7017ab514aad6751a7c2ea)

Download SlackBuild:
python-pdfminer.tar.gz
python-pdfminer.tar.gz.asc (FAQ)

(the SlackBuild does not include the source)

Validated for Slackware 14.2

See our HOWTO for instructions on how to use the contents of this repository.

Access to the repository is available via:
ftp git cgit http rsync

© 2006-2018 SlackBuilds.org Project. All rights reserved.
Slackware® is a registered trademark of Patrick Volkerding
Linux® is a registered trademark of Linus Torvalds
Web Design by WebSight Designs |  Managed Hosting by OnyxLight Communications