Using xtopdf, a PDF Creation Toolkit

This article by Vasudev Ram, an independent software consultant, shows how to use xtopdf to create Portable Document Format (PDF) output from some different input formats. xtopdf is an open source project, created by me. It is freely available from the project site on SourceForge at sourceforge.net/projects/xtopdf or from my web site at www.dancingbison.com. The purpose of xtopdf is to provide a toolkit that provides ways of converting other file formats to PDF.

1. Creating PDF Output from Plain Text, DBF,  CSV,  TDV, and XLS Data

This article shows how to use xtopdf to create Portable Document Format (PDF) output from some different input formats. xtopdf is an open source project, created by me. It is freely available from the project site on SourceForge at sourceforge.net/projects/xtopdf or from my web site at www.dancingbison.com .

The purpose of xtopdf is to provide a toolkit that provides ways of converting other file formats to PDF. It currently provides the ability to convert the following input formats to PDF:

2. Supported Input Formats

2.1 Plain Text

Nothing to say here. Everyone knows about text—even non-computer users.

2.2 DBF (XBase Data Files)

DBF was a de facto desktop and multi-user LAN database format for several years. DBF was the data file format of dBASE II, III, and IV (and soon after, many clones), which were some of the early and wildly successful software products that largely helped create the original mass market for PC's—a huge market, there are now hundreds of millions of PC's in the world. Though there are more advanced products available nowadays (desktop databases such as Microsoft Access, SQL Server, IBM DB2, and other RDBMS's that run on PCs), it's an educated guess that the DBF format is still in widespread use worldwide. Several clones or similar products such as Clipper, FoxPro, the Harbour project, attest to this.

2.3 CSV (Comma Separated Values)

CSV is a de facto industry standard for data, one that is supported as an export/import format by most major spreadsheet software products, as well as by some desktop database and other software applications. So, support for this input format means that you can convert data from any software application to PDF, as long as that product can export its data to CSV format.

2.4 TDV (Tab Delimited Values)

TDV (my own acronym, it may not be an industry one) is a de facto industry standard for data on UNIX and Linux systems. Common UNIX/Linux tools such as sed and awk often use this format, both for input and output. Support for this input format means that you can publish the output of such tools as PDF.

2.5 XLS (Microsoft Corp.'s Native Format for Excel spreadsheets)

Only simple spreadsheets that have plain text content, such as strings, numbers and dates, are supported. Spreadsheets with formatted cells (bold, italic, right-justified, etc.) or embedded images are not supported, or the formatting and images may be lost in the PDF output. Support for this input format means that you can publish your spreadsheets as PDF.

As Larry Wall, creator of the Perl language says in a Perl manual after describing the benefits of Perl: "Ok, enough hype":-). Let's get down to how to use xtopdf to convert the above-mentioned data formats to PDF.

3. Using xtopdf

3.1 Overview

The xtopdf software contains:

  • A library that developers can use in their own applications. The library is available in both procedural and object-oriented versions, with more or less the same functionality, so that developers who are comfortable with either paradigm (procedural or object-oriented programming) can use it.
  • A set of end-user tools that can be used by anyone.

In this article, I focus on the end-user tools that convert the aforementioned input formats to PDF. The following sections describe such ways of using of xtopdf.

3.2 Plain Text to PDF

3.2.1 Conversion by a Command-Line Tool

Open a command prompt or window (represented by "$" in the following examples, though you may be on UNIX/Linux or Windows—xtopdf is cross-platform).

At the prompt, run the WritePDF.py tool as follows:

    $ python WritePDF.py your_text_file.txt

An example:

    $ python WritePDF.py your_file.txt

This will run the text-to-PDF conversion tool WritePDF.py to create a PDF file with the same base name as the text file, but with the extension changed from .txt to .pdf, i.e. your_file.pdf.

3.2.2 Conversion by a wxPython GUI Tool

wxPython is a GUI toolkit for the Python language based on wxWindows / wxWidgets, a leading cross-platform C++ GUI toolkit. wxPython is also a cross-platform, and a good toolkit; Eric Raymond, an influential open source developer, author, and advocate, strongly recommends it. The xtopdf GUI tools shown in the following screenshots are written using wxPython. You will need to install wxPython on your PC to run those tools. You can get it from its website at www.wxpython.org

At a command prompt, run the TextToPdfGui.py tool as follows:

    $ python TextToPdfGui.py

This GUI tool allows you to specify the names of the input text file and the output PDF file via dialogs (invoked by clicking the "Text file" and "PDF file" buttons sequentially); next, when you click the Run button, the text file is converted to PDF, just as in the previous (command-line) example.

The following screenshots illustrate the tool in action:

                                   Figure 1: TextToPdfGui.py - initial screen

 

Figure 2: TextToPdfGui.py—after clicking the "Text file" button



Figure 3: TextToPdfGui.py—after clicking the "PDF file" button

Figure 4: The PDF file generated by TextToPdfGui.py, as seen in Acrobat Reader


3.2.3 Conversion by a Command-Line Tool of Text to a PDF e-Book

This example shows how xtopdf can be used to create simple PDF e-books from text files, using an application program that builds on xtopdf.

At a command prompt, run the PDFBook.py tool as follows:

    $ python PDFBook.py book1.pdf book1.txt

The contents of file book1.txt are:

    book1-chapter1.txt:Chapter 1. Preface.

    book1-chapter2.txt:Chapter 2. Introduction.

    book1-chapter3.txt:Chapter 3. The problem.

    book1-chapter4.txt:Chapter 4. The possible solutions.

    book1-chapter5.txt:Chapter 5. The decision.

Each line consists of two fields separated by a colon (:); the filename of a chapter (a text file—e.g. book1-chapter1.txt), followed by the title of that chapter. PDFBook.py reads the chapter text files and chapter titles given, and creates a PDF e-book having those chapters and titles. (The chapter files contain dummy data, not a real book).

Figure 5: The PDF e-book generated by PDFBook.py, as seen in Acrobat Reader (first page)



Figure 6: The PDF e-book generated by PDFBook.py, as seen in Acrobat Reader (last page)


3.3 DBF Data to PDF

3.3.1 DBF File Information:

DBF files contain both metadata (data about data) and data records.

This is the structure of a DBF file; the following three sections occur in sequence:

  • The File Header section, having, for each field:
  • DBF version (signature)
  • Date of last update
  • Number of data records in the file
  • DBF header length in bytes
  • DBF record length in bytes
  • Number of fields in the DBF file

  • The Field Header section, having, for each field:
  • Field name
  • Field type
  • Field length
  • Field decimals

Field types can be Character (C), Numeric (N), Date (D), Logical (L), or Memo (M). The letters in parentheses are the way the field types are shown in the following Figure 7 (and also they way they are stored in the DBF file).

  • The  Data Records section, having, for each field:
  • Data records

3.3.2 Conversion by a Command-Line Tool

Open a command prompt or window. Run DBFToPDF.py:

    $ python DBFToPDF.py your_dbf_file.dbf your_file.pdf

An example:

    $ python DBFToPDF.py test4.dbf test4.pdf

This runs the DBF-to-PDF conversion tool DBFToPDF.py to create a PDF file named test4.pdf, having the metadata and data of test4.dbf. If you don't have any compatible XBase software, test4.dbf can be opened in MS Excel—specify the file type as Dbase files in the Open dialog:

Figure 7: The content of test4.dbf as seen in Microsoft Excel


 

Here is the output of DBFToPDF.py as seen in Adobe Reader (multiple screenshots follow):


Figure 8: The DBF metadata (the file header) of test4.dbf (in PDF)


Figure 9: The DBF metadata (the field headers) of test4.dbf (in PDF)



 

Only the first page is shown; there are 660 records in the DBF file:


Figure 10: The DBF data (the data records) of test4.dbf (in PDF)


 

3.3.3 Conversion by a wxPython GUI Tool

Open a command prompt or window. Run DbfToPdfGui.py:

    $ python DbfToPdfGui.py

Figure 11: DbfToPdfGui.py – initial screen

 

This program, DbfToPdfGui.py works in a very similar way to the program TextToPdfGui.py described earlier. The only difference is that in the dialog that pops up to allow you to open the input file (after you press the “DBF file" button), you get to select a DBF file instead of a text file. So I won't show the other screens as I did for TextToPdfGui.py; instead, will just show the output in Adobe Reader—obviously, this should look similar to that from the DbdToPDf.py command-line tool described in Section 3.3.2. The only reason there is some difference in the output (mainly in the headings and the formatting of the content) is that I did it slightly differently in this case.

Here is the output as seen in Adobe Reader (two screenshots follow):


Figure 12: The DBF metadata (File and Field headers) and the first two records of the DBF file (in PDF)


Figure 13: The last few records of the DBF file (in PDF)

 

3.4 CSV Data to PDF

3.4.1 CSV File Information

Comma Separated Values (CSV) is a simple but useful tabular data format. A working definition of the format follows. I'm not going into full details since different software products implement this format in different ways, details of which are outside the scope of this article:

CSV consists of text, one record per line. Each line consists of fields, separated by a comma (hence the name CSV). The number of fields need not be the same in each record. Each field is usually a number—integer or floating-point, i.e. with decimals, a string (letters, digits, and punctuation), or a date. There can be other types of values, e.g. Booleans (true or false values). A field may be enclosed in single or double quotes (numbers usually aren't), if it contains embedded spaces or quotes.

3.4.2 Conversion by a Command-Line Tool

Open a command prompt or window. Run CsvToPdf.py:

    $ python CsvToPdf.py your_file.csv your_file.pdf

An example:

    $ python CsvToPdf.py file3.csv file3.pdf

This runs the CSV-to-PDF conversion tool CsvToPdf.py to create a PDF file named file3.pdf having the data of file3.csv


Figure 14: Contents of file3.csv (in a text editor)



Figure 15: Contents of file3.csv (in PDF)


 

3.4 TDV Data to PDF

3.4.1 TDV File Information:

Tab Delimited Values (TDV) is also a simple tabular data format, widely used on UNIX/Linux, particularly as the input as well as output of UNIX/Linux command-line tools such as sed, awk, and many others. A definition of the format follows:

TDV consists of text, one record per line. Each line consists of fields. Each field may consist of any characters. Tabs delimit fields (hence the name TDV).

3.4.2 Conversion by a Command-Line Tool

Open a command prompt or window. Run TdvToPdf.py:

    $ python TdvToPdf.py your_file.tdv your_file.pdf

An example:

    $ python TdvToPdf.py file2.tdv file2.pdf

This runs the TDV-to-PDF conversion tool TdvToPdf.py to create a PDF file named file2.pdf from the contents of file2.tdv. The following figure shows file2.tdv in the gvim text editor. The tabs in the file are made visible by using the ":se list" command of gvim, which shows tabs as Ctrl-I characters (^I in the figure). This is because Ctrl-I is one way to represent the ASCII code 9, which stands for a tab (I is the 9th character of the alphabet, control characters (such as tab) go from Ctrl-A to Ctrl-Z, and tab is the 9th control character, so Ctrl-I means the tab character).


Figure 16: Contents of file2.tdv (in the gvim text editor)



Figure 17: Contents of file2.tdv (in PDF)


 

3.5 XLS Data to PDF

3.5.1 XLS File Information:

XLS is Microsoft Corp.'s spreadsheet format used in Microsoft Excel.

The program XlsToPdf.py converts Excel spreadsheets to PDF.

3.5.2 Conversion by a Command-Line Tool

Open a command prompt or window. Run XlsToPdf.py:

    $ python XlsToPdf.py your_file.xls your_file.pdf

An example:

    $ python XlsToPdf.py file01.xls

The above command runs the XLS-to-PDF conversion tool XlsToPdf.py to create a PDF file named file01.pdf from the contents of file01.xls. Screenshots follow:



Figure 18: Contents of file01.xls



Figure 19: Contents of file01.xls (in PDF)

 

4. Conclusion

In this article, we've seen ways of using some of the tools that are built on top of xtopdf, to create PDF output from plain text, DBF, CSV, TDV, and XLS data. The Application Programming Interface (API) of xtopdf accepts plain text as one of its basic inputs, thus providing the ability to convert the text that can be extracted from any data source (either manually or programmatically), into PDF.

Whether you're a developer or an end user, I hope this article motivates you to take sourceforge.net/projects/xtopdf for a spin, and I welcome any feedback on it. Please feel free to contact me via my website, www.dancingbison.com .

To try out the software described in the article download the following two files:

  • xtopdf v1.0: contains the core xtopdf code, including the code for plain text-to-PDF and DBF-to-PDF conversion. Get it here:

         URL: http://www.dancingbison.com/xtopdf-1.0.tar.gz

  • xtopdf v1.3: contains the new code (alpha) for CSV-, TDV-, and XLS-to-PDF conversions. Get it here:

URL: http://www.dancingbison.com/xtopdf-1.3.zip

Extract the contents of xtopdf-1.0.tar.gz into directory C:Python24Libsite-packages (assuming that C:Python24 is the base directory of your Python installation). Change the C:Python to the right directory if yours is different.  

Then follow the instructions for installing xtopdf's prerequisites (Python and the ReportLab toolkit), as given in the xtopdf-1.0.tar.gz README.txt file. 

Make sure that xtopdf-1.0.tar.gz has been installed properly by trying to run one of the programs for plain text conversion, e.g. WritePDF.py (see the article for the exact command to use). 

Then extract the contents of xtopdf-1.3.zip into the same directory where you extracted xtopdf-1.0.tar.gz.

Now you should be able to run the programs described in the article.

For more detailed instructions, please take a look at: http://itext.ugent.be/library/question.php?id=41

Cool Tips:

Apart from the well-known benefits of read-only, good cross-platform viewing and printing, zooming, etc., Adobe PDF has another couple of neat features I came across while developing xtopdf and viewing the results in Acrobat Reader:

  • You can "drag" a document in Acrobat Reader to the left, right, diagonally, etc., to "pan" it across the screen without using the scrollbars. Just left-click anywhere in the middle of the Reader screen with a PDF open in it, then try moving the mouse with the mouse button still pressed. The document will move according to your mouse movements. This helps read a document without needing to use the scrollbars all the time, when its content is too wide for all of it to fit on the screen. If you have a scroll mouse, it gets better - the scroll wheel can be used to scroll up or down, so you can read the entire document, by dragging and scrolling, just using one hand on the mouse.
  • Zooming in Acrobat Reader makes:
  • The fonts look better sometimes, when enlarged
  •  Old, almost unreadable PDF docs (maybe they were scanned in) easier to read

Vasudev Ram is an independent software consultant, writer and trainer with many years of experience on different software areas. He works via Dancing Bison Enterprises - www.dancingbison.com.

 

 

   

 

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software