Using Flying Saucer and iText in Java to convert XHTML to PDF

The situation

Where I work, we were generating reports in XHTML, for printing. The styling should be easily configurable. The problem with this approach is the cross-browser look of your report. IE and Chrome print two completely different reports, so to speak.
To prevent the difference between browsers, and to re-use the existing report generation, I decided I needed to render the print at the serverside. So I had html, and I wanted it to look the same on every computer. Pdf is a good medium for this purpose, so I needed a html to pdf library, for our Java system. I first tried iText by itself, but this did not apply the css. Browsing the web a bit further I found the combo Flying Saucer and iText, and this was a winning combination for us.
Flying Saucer is a Java library that renders XHTML/XML + CSS to screen/image/PDF. For PDF there is a dependency on iText, a library to create pdf files.

The code

In this blogpost I provide some codesnippets, not a full working example.

Maven dependencies

Add this to your pom.xml in the dependencies section:
 <!-- Flying Saucer and iText -->
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itextpdf</artifactId>
        <version>5.1.3</version>
    </dependency>
    <dependency>
        <groupId>org.xhtmlrenderer</groupId>
        <artifactId>core-renderer</artifactId>
        <version>R8</version>
    </dependency>

The converter

The converter is pretty straight forward. It reads a XHTML String and writes the pdf to a FileOutputStream. Please read the gotchas below, because there are some, well, gotchas.

import org.apache.log4j.Logger;
import org.xhtmlrenderer.pdf.ITextRenderer;

import java.io.FileOutputStream;

public class HtmlToPdfConverter {
    private static final Logger LOG = Logger.getLogger(HtmlToPdfConverter.class);

    public void htmlStringToPdfStream(String html, String tempFile) {
        try {

            FileOutputStream pdf = new FileOutputStream(tempFile);

            ITextRenderer renderer = new ITextRenderer();
            renderer.setDocumentFromString(html);
            renderer.layout();
            renderer.createPDF(pdf);

            pdf.close();

        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

Debugging/logging

To enable logging output from our converter, add this file to your system:

$user.home/.flyingsaucer/local.xhtmlrenderer.conf

See both the Flying Saucer Userguide and this example file, what you can put in there.

Gotchas

Valid XHTML

Flying saucer will generate errors and produce no output, when the XHTML contains errors. An XHTML document is in fact a valid XML document, so all those rules apply. You can check your XHTML on the W3C Validator. Make sure tags are nested correctly, there are no block level tags inside inline level tags, and all special characters are escaped properly. HTML is not the same as XHTML and XML, if you want to parse a HTML document, you should do some preprocessing with JTidy or TagSoup.

Table layout with divs and CSS

One special case I encountered: I tried to make a table layout with divs with css properties
display: table/table-header-group/table-row/table-cell
etc. (see here). The implementation of Flying Saucer of these properties come very precise. It should all be nested correctly. A table (needs a header/footer-group) needs a row-group needs a row needs cells.

Xalan 2.7.0 minimum

A gotcha that is described more widely on the internet, is that the Xalan component has a minimum of 2.7.0. If you use maven, make sure this is the case. I used 2.7.1 actually.

CSS complete urls

A thing with Flying Saucer, is that it is not a web browser. It just reads a stand alone file. If there are relative URI's in there, it does not know where to find those. Please make sure all your paths are absolute.

Comments

Popular posts from this blog

Microk8s cluster on a homelab (Proxmox) local network with a Fritzbox and Metallb

ABN Amro bank statement export to .OFX