Fastqc Html Report To Pdf (With A Script)
7
14
Entering edit mode
11.1 years ago

Hey there,

Anybody have a solution to convert fastQC html output to a pdf? If you've got 50 or 500 FASTQs to check, and more importantly share via email, the hmtl output is a little clunky to deal with.

Pierre Lindenbaum suggested apache FOP on twitter. I tried it but it seems it needs an xlt style sheet to work.

I also tried wkhtmltopdf on my linux machine and wkpdf on my macbook. Both of these resulted in blank PDFs.

Opening each html one by one and printing to PDF on my mac works, but is a really really slow option.

Point is I want to script this out, ideally in linux, and get a bunch of PDFs in the end.

Posted below are two links with test data, the first shows what the output looks like in a browser, the second is the full output from fastQC.

Thanks!

fastQC example page

fastqc script • 11k views
1
Entering edit mode

Why, in the first place, does non-interactive FastQC output come in HTML ? - is something I think I'll never understand.

7
Entering edit mode
11.1 years ago
Neilfws 49k

Surprised no-one has mentioned HTMLDOC. For Ubuntu and similar simply:

sudo apt-get install htmldoc


then:

htmldoc --webpage -f output.pdf index.html


or just "htmldoc" for the GUI.

1
Entering edit mode

Yes, this works nicely too! Played with the options to make things fit better, running this htmldoc --webpage --browserwidth 800 --fontsize 7 -f output.pdf fastqc_report.html

0
Entering edit mode

I installed this in Mac OS Mojave (10.14.13) and it works but the output is all black and white and without the plots :/.

6
Entering edit mode
11.1 years ago

I just installed wkhtmltopdf and used it on your html file with this command:

wkhtmltopdf 20A.R2.QC.fq_fastqc/fastqc_report.html test.pdf


And I got a test.pdf file back with the correct contents. Here is the pdf file uploaded to imgur: http://imgur.com/f4fCz

Imgur converted the .pdf to .png so the quality is not great.

1
Entering edit mode

I get a QPixmap: Cannot create a QPixmap when no GUI is being used error. Seems this is a bug. Running on a 64 centOS machine. Curious what you ran it on?

0
Entering edit mode

Thats crazy. I do that and I get a header and nothing else: http://public.tgen.org/jcorneveaux/FASTQC/test.pdf

0
Entering edit mode

I ran it on Ubuntu 11.04 64 bit.

6
Entering edit mode
11.1 years ago
Rm 8.2k

This is the script qcimg2pdf.sh) i use as part of Fastq workflow. I use some of the images from fastqc: Run it in the fastqc parent directories for different lanes....

#!/bin/bash
## qcimg2pdf.sh
echo "Usage: $0 -o output_prefix"; use ghostscript-9.02 ## if already exists in path comment this Line if [[$# -eq 0 || $# -gt 2 ]] then echo "No/wrong ($#) arguments detected "
echo "Run it where you have *fastqc directories";
exit 1 #exit shell script
fi

while getopts o: option
do
case $option in o) outprefix=$OPTARG
;;
esac
echo $outprefix; if [[$outprefix != "" ]];then
for j in ls -d1 *fastqc ;
do
echo $j ; convert $$-scale 500x500 j/Images/per_base_quality.png j/Images/per_base_gc_content.png +append$$ $$-scale 500x500 j/Images/per_sequence_quality.png j/Images/per_sequence_gc_content.png +append$$ -append -font Helvetica -pointsize 12 -gravity northeast -draw "translate +5,+5 text 80,80 'grep -A5 Filename$j/fastqc_data.txt'" QC.$j.pdf done gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=qc-lanes.$outprefix.pdf QC.*.pdf

else
echo "use correct arguments with only -o "
exit 1 #exit shell script
fi

done

0
Entering edit mode

Brilliant solution, and just the simple kind of approach I needed, thx RM!

0
Entering edit mode

I'm not sure I understand how this works. What arguments do you need to give the script?

0
Entering edit mode

In my case I ran it in Linux and it produced a pdf with three plots, it didn´t convert properly the whole report into PDF.

3
Entering edit mode
11.1 years ago

I would be interested in extracting the raw FastQC data as generic tables to render plots in R

If anyone would like to participate I could create a github repository for this "project".

2
Entering edit mode

I think the fastqc_data.txt file has much of it..

2
Entering edit mode

I did this a while ago, but never maintained it - you're free to cannibalise as much as you want/need! https://github.com/clark-lab-robot/Repitools-git/blob/master/pkg/Repitools/R/FastQC-class.R

1
Entering edit mode

There is also a bioc package called qrqc that will get you some of the fastqc stats as well. The nice thing about the package is that it does all the read processing "online" (you don't have to load the entire thing) in C code, like fastqc does.

0
Entering edit mode

Yes, been testing qrqc too..

0
Entering edit mode

Thanks Aaron, looks like some good stuff :)

3
Entering edit mode
11.1 years ago

Here is the XSLT stylesheet for FO:

the HTML document is not a valid XML document so I used xsltproc to fix the document before using FOP. Here is the Makefile:

all:fastqc.pdf

fastqc.fo:fastqc2fo.xsl fastqc_report.html
xsltproc --html fastqc2fo.xsl fastqc_report.html > $@ fastqc.pdf:fastqc.fo fop$< \$@


The result was posted on slideshare: http://www.slideshare.net/lindenb/biostar17037

Edit: my output is missing one or two tables but you get the idea.

2
Entering edit mode
11.1 years ago
Tyler Moore ▴ 20

My company, Expected Behavior, has a service called DocRaptor that converts HTML to PDF or Excel format. Unlike wkhtmltopdf, DocRaptor generates fully functional PDF files, not just a PNG.

http://docraptor.com/

And a link to the code example page. You make an HTTP POST request to DocRaptor's server, and we send your file back to you.

http://docraptor.com/examples

1
Entering edit mode
11.1 years ago

I just tried html to latex and after a few minutes I decided that you are right: must be an easier way.

@DK's approach seems the best.

http://htmltolatex.sourceforge.net/

0
Entering edit mode

yea... I just wish it actually worked for me..