|
|
BOOK ![]() JasperReports for Java Developers See More BOOK ![]() Moodle E-Learning Course Development See More BOOK ![]() OpenVPN: Building and Integrating Virtual Private Networks See More BOOK ![]() Web Host Manager Administration Guide See More BOOK ![]() Implementing SugarCRM See More BOOK ![]() BPEL Cookbook: Best Practices for SOA-based integration and composite applications development See More |
Using xtopdf, a PDF Creation Toolkit
This article by Vasudev Ram, an independent software consultant, shows how to use xtopdf to create Portable Document Format (PDF) output from some different input formats. xtopdf is an open source project, created by me. It is freely available from the project site on SourceForge at sourceforge.net/projects/xtopdf or from my web site at www.dancingbison.com. The purpose of xtopdf is to provide a toolkit that provides ways of converting other file formats to PDF. 1. Creating PDF Output from Plain Text, DBF, CSV, TDV, and XLS DataThis article shows how to use xtopdf to create Portable Document Format (PDF) output from some different input formats. xtopdf is an open source project, created by me. It is freely available from the project site on SourceForge at sourceforge.net/projects/xtopdf or from my web site at www.dancingbison.com . The purpose of xtopdf is to provide a toolkit that provides ways of converting other file formats to PDF. It currently provides the ability to convert the following input formats to PDF: 2. Supported Input Formats2.1 Plain TextNothing to say here. Everyone knows about text—even non-computer users. 2.2 DBF (XBase Data Files)DBF was a de facto desktop and multi-user LAN database format for several years. DBF was the data file format of dBASE II, III, and IV (and soon after, many clones), which were some of the early and wildly successful software products that largely helped create the original mass market for PC's—a huge market, there are now hundreds of millions of PC's in the world. Though there are more advanced products available nowadays (desktop databases such as Microsoft Access, SQL Server, IBM DB2, and other RDBMS's that run on PCs), it's an educated guess that the DBF format is still in widespread use worldwide. Several clones or similar products such as Clipper, FoxPro, the Harbour project, attest to this. 2.3 CSV (Comma Separated Values)CSV is a de facto industry standard for data, one that is supported as an export/import format by most major spreadsheet software products, as well as by some desktop database and other software applications. So, support for this input format means that you can convert data from any software application to PDF, as long as that product can export its data to CSV format. 2.4 TDV (Tab Delimited Values)TDV (my own acronym, it may not be an industry one) is a de facto industry standard for data on UNIX and Linux systems. Common UNIX/Linux tools such as sed and awk often use this format, both for input and output. Support for this input format means that you can publish the output of such tools as PDF. 2.5 XLS (Microsoft Corp.'s Native Format for Excel spreadsheets)Only simple spreadsheets that have plain text content, such as strings, numbers and dates, are supported. Spreadsheets with formatted cells (bold, italic, right-justified, etc.) or embedded images are not supported, or the formatting and images may be lost in the PDF output. Support for this input format means that you can publish your spreadsheets as PDF. As Larry Wall, creator of the Perl language says in a Perl manual after describing the benefits of Perl: "Ok, enough hype":-). Let's get down to how to use xtopdf to convert the above-mentioned data formats to PDF. 3. Using xtopdf3.1 OverviewThe xtopdf software contains:
In this article, I focus on the end-user tools that convert the aforementioned input formats to PDF. The following sections describe such ways of using of xtopdf. 3.2 Plain Text to PDF3.2.1 Conversion by a Command-Line ToolOpen a command prompt or window (represented by "$" in the following examples, though you may be on UNIX/Linux or Windows—xtopdf is cross-platform). At the prompt, run the WritePDF.py tool as follows: $ python WritePDF.py your_text_file.txt An example: $ python WritePDF.py your_file.txt This will run the text-to-PDF conversion tool WritePDF.py to create a PDF file with the same base name as the text file, but with the extension changed from .txt to .pdf, i.e. your_file.pdf. 3.2.2 Conversion by a wxPython GUI ToolwxPython is a GUI toolkit for the Python language based on wxWindows / wxWidgets, a leading cross-platform C++ GUI toolkit. wxPython is also a cross-platform, and a good toolkit; Eric Raymond, an influential open source developer, author, and advocate, strongly recommends it. The xtopdf GUI tools shown in the following screenshots are written using wxPython. You will need to install wxPython on your PC to run those tools. You can get it from its website at www.wxpython.org At a command prompt, run the TextToPdfGui.py tool as follows: $ python TextToPdfGui.py This GUI tool allows you to specify the names of the input text file and the output PDF file via dialogs (invoked by clicking the "Text file" and "PDF file" buttons sequentially); next, when you click the Run button, the text file is converted to PDF, just as in the previous (command-line) example. The following screenshots illustrate the tool in action: ![]() Figure 1: TextToPdfGui.py - initial screen
![]() Figure 2: TextToPdfGui.py—after clicking the "Text file" button
![]() Figure 3: TextToPdfGui.py—after clicking
the "PDF file" button ![]() Figure
4: The PDF file generated by TextToPdfGui.py, as seen in Acrobat Reader
This example shows how xtopdf can be used to create simple
PDF e-books from text files, using an application program that builds on xtopdf.
At a command prompt, run the PDFBook.py tool as follows: $ python PDFBook.py book1.pdf book1.txt The contents of file book1.txt are: book1-chapter1.txt:Chapter
1. Preface. book1-chapter2.txt:Chapter
2. Introduction. book1-chapter3.txt:Chapter
3. The problem. book1-chapter4.txt:Chapter
4. The possible solutions. book1-chapter5.txt:Chapter
5. The decision. Each line consists of two fields separated by a colon (:); the filename of a chapter (a text file—e.g. book1-chapter1.txt),
followed by the title of that chapter. PDFBook.py
reads the chapter text files and chapter titles given, and creates a PDF e-book
having those chapters and titles. (The chapter files contain dummy data, not a
real book). Figure
5: The PDF e-book generated by PDFBook.py, as seen in Acrobat Reader (first
page)
Figure
6: The PDF e-book generated by PDFBook.py, as seen in Acrobat Reader (last
page) DBF files contain both metadata (data about data) and data
records. This is the structure of a DBF file; the following three
sections occur in sequence:
Field types can be Character (C), Numeric (N), Date (D), Logical (L), or Memo (M). The letters in parentheses are the way the field types are shown in the following Figure 7 (and also they way they are stored in the DBF file).
Open a command prompt or window. Run DBFToPDF.py: $ python DBFToPDF.py your_dbf_file.dbf your_file.pdf An example: $ python DBFToPDF.py test4.dbf test4.pdf This runs the DBF-to-PDF
conversion tool DBFToPDF.py to create a PDF file named
test4.pdf, having the metadata and data of test4.dbf.
If you don't have any compatible XBase software, test4.dbf can
be opened in MS Excel—specify the file type as Dbase
files in the Open
dialog: Figure 7: The content of test4.dbf as
seen in Microsoft Excel
Here is the output of DBFToPDF.py as
seen in Adobe Reader (multiple screenshots follow): Figure 8: The DBF metadata (the file header) of test4.dbf (in PDF)
Figure 9: The DBF metadata (the field headers) of test4.dbf (in PDF)
Only the first page is shown; there are 660 records in
the DBF file: Figure 10: The DBF data (the data records) of test4.dbf (in PDF)
Open a command prompt or window. Run DbfToPdfGui.py: ![]() Figure 11: DbfToPdfGui.py – initial screen
This program, DbfToPdfGui.py
works in a very similar way to the program TextToPdfGui.py
described earlier. The only difference is that in the dialog that pops up to allow
you to open the input file (after you press the “DBF file" button), you get to
select a DBF file instead of a text file. So I won't show the other screens as
I did for TextToPdfGui.py; instead, will just show the output in
Adobe Reader—obviously, this should look similar to that from the DbdToPDf.py
command-line tool described in Section 3.3.2. The only reason there is some
difference in the output (mainly in the headings and the formatting of the
content) is that I did it slightly differently in this case. Here is the output as seen in Adobe Reader (two
screenshots follow): Figure 12: The DBF metadata (File and Field headers) and the first two records of the DBF file (in PDF)
Figure 13: The last few records of the
DBF file (in PDF)
Comma Separated Values (CSV) is a simple but useful
tabular data format. A working definition of the format follows. I'm not going
into full details since different software products implement this format in
different ways, details of which are outside the scope of this article: CSV consists of text, one record
per line. Each line consists of fields, separated by a comma (hence the name
CSV). The number of fields need not be the same in each record. Each field is
usually a number—integer or floating-point, i.e. with decimals, a string
(letters, digits, and punctuation), or a date. There can be other types of
values, e.g. Booleans (true or false values). A field may be enclosed in single
or double quotes (numbers usually aren't), if it contains embedded spaces or
quotes. Open a command prompt or window. Run CsvToPdf.py: $ python CsvToPdf.py your_file.csv your_file.pdf An example: $ python CsvToPdf.py file3.csv file3.pdf This runs the CSV-to-PDF
conversion tool CsvToPdf.py to create a PDF file named
file3.pdf having the data of file3.csv Figure 14: Contents of file3.csv (in a text editor)
Figure 15: Contents of file3.csv (in PDF)
Tab Delimited Values (TDV) is also a simple tabular data
format, widely used on UNIX/Linux, particularly as the input as well as output
of UNIX/Linux command-line tools such as sed, awk,
and many others. A definition of the format follows: TDV consists of text, one record
per line. Each line consists of fields. Each field may consist of any
characters. Tabs delimit fields (hence the name TDV). Open a command prompt or window. Run TdvToPdf.py: $ python TdvToPdf.py
your_file.tdv your_file.pdf An example: $ python TdvToPdf.py
file2.tdv file2.pdf This runs the TDV-to-PDF
conversion tool TdvToPdf.py to create a PDF file named
file2.pdf from the contents of file2.tdv.
The following figure shows file2.tdv in
the gvim text editor. The tabs in the file are made visible by using the ":se
list" command of gvim, which shows tabs as Ctrl-I characters
(^I in the figure). This is because Ctrl-I is one way to represent the ASCII
code 9, which stands for a tab (I is the 9th character of the alphabet, control
characters (such as tab) go from Ctrl-A to Ctrl-Z, and tab is the 9th control
character, so Ctrl-I means the tab character). Figure 16: Contents of file2.tdv (in the gvim text editor)
Figure 17: Contents of file2.tdv (in PDF)
XLS is Microsoft Corp.'s spreadsheet format used in
Microsoft Excel. The program XlsToPdf.py
converts Excel spreadsheets to PDF. Open a command prompt or window. Run XlsToPdf.py: $ python XlsToPdf.py
your_file.xls your_file.pdf An example: $ python XlsToPdf.py file01.xls The above command runs the XLS-to-PDF conversion tool XlsToPdf.py
to create a PDF file named file01.pdf from the contents of file01.xls.
Screenshots follow: Figure 18: Contents of file01.xls
Figure 19: Contents of file01.xls (in PDF)
In this article, we've seen ways of using some of the
tools that are built on top of xtopdf, to create PDF output from plain text,
DBF, CSV, TDV, and XLS data. The Application Programming Interface
(API) of xtopdf accepts plain text as one of
its basic inputs, thus providing the ability to convert the text that can be
extracted from any data source (either manually or programmatically),
into PDF. Whether you're a developer or an end user, I hope this
article motivates you to take sourceforge.net/projects/xtopdf
for a spin, and I welcome any feedback on it. Please feel free to contact me
via my website, www.dancingbison.com
. To try out the software described in the article download
the following two files:
URL: http://www.dancingbison.com/xtopdf-1.0.tar.gz
URL: http://www.dancingbison.com/xtopdf-1.3.zip Extract the contents of xtopdf-1.0.tar.gz
into directory C:Python24Libsite-packages
(assuming that C:Python24 is the base directory of your Python
installation). Change the C:Python to the right directory if
yours is different. Then follow the instructions for installing xtopdf's
prerequisites (Python and the ReportLab toolkit), as given in the xtopdf-1.0.tar.gz
README.txt file. Make sure that xtopdf-1.0.tar.gz
has been installed properly by trying to run one of the programs for plain text
conversion, e.g. WritePDF.py (see the article for the
exact command to use). Then extract the contents of xtopdf-1.3.zip
into the same directory where you extracted xtopdf-1.0.tar.gz. Now you should be able to run the programs described in
the article. For more detailed instructions, please take a look at: http://itext.ugent.be/library/question.php?id=41 Cool Tips: Apart from the well-known benefits of read-only, good cross-platform viewing and printing, zooming, etc., Adobe PDF has another couple of neat features I came across while developing xtopdf and viewing the results in Acrobat Reader:
Vasudev Ram is an independent software consultant, writer and trainer with many years of experience on different software areas. He works via Dancing Bison Enterprises - www.dancingbison.com.
|
| ||||||