11 days ago by
Danmarks Tekniske Universitet
This is how we got around it at the Max Planck in Leipzig. This is was for ancient DNA from Neanderthals/ancient humans but should work for plants, DNA is DNA after all :-)
Say we have in.fq1.gz in.fq2.gz as paired, untrimmed fastq but demultiplexed (not demultiplexed can be handle as well but needs an extra command. We use the following:
fastq2bam -o /dev/stdout in.fq1.gz in.fq2.gz | leeHomMulti -u --ancientdna -o /dev/stdout /dev/stdin | network-aware-bwa/bwa bam2bam -n 0.01 -o 2 -l 16500 -g reference.fasta - | samtools sort ...
fastq2bam is from my own BCL2BAM2FASTQ
leeHom is a specialized adapter trimmer+merger for ancient DNA and shorter DNA molecules. It is still to my knowledge the most accurate, see software repo: leeHom and our paper in NAR
network aware bwa is a nifty fork of bwa aln from a colleague in Leipzig which can eat BAM and spit out bam. It also considers in the insert size computation both the merged and paired reads, see software here
and samtools sort is the normal samtools sort, adapt your command line to account for the version of samtools.
I am biased but this is the cleanest workflow for ancient DNA that I know of. Everything is bam and what goes it is what goes out. You can even instruct leeHom to keep the original reads+QC score with the --keepOrig, they will exist as QC failed reads and will be ignored but you can find them in the BAM file.
Hope this helps!