Question: How to use trimmed and merged outputs in bwa?
gravatar for haghverdi_ac
11 days ago by
haghverdi_ac0 wrote:

Hi, I just joined this website and I have a question:

I am working on some of the ancient plant samples and the goal is to perform a comparative analysis with modern species. sequencing has been done as paired-end and I have run trimmomatic for trimming and casper for merging. Now the output files are Forward_unpaired.fastq, Reverse_unpaierd.fastq and merged.fastq. so, I intend to use these input files in bwa but as I checked bwa helps, I could find the options for them, can anybody help me here how to use these files in bwa? If I have to run bwa once for both forward and reverse paired ones and once for merged ones, then how can I compile them afterward?


alignment • 77 views
ADD COMMENTlink modified 11 days ago • written 11 days ago by haghverdi_ac0
gravatar for swbarnes2
11 days ago by
United States
swbarnes27.0k wrote:

Why can't you merge the bams after alignment? It's not clear to me that merging the pairs is going to help you much.

ADD COMMENTlink written 11 days ago by swbarnes27.0k
gravatar for Gabriel R.
11 days ago by
Gabriel R.2.6k
Danmarks Tekniske Universitet
Gabriel R.2.6k wrote:

This is how we got around it at the Max Planck in Leipzig. This is was for ancient DNA from Neanderthals/ancient humans but should work for plants, DNA is DNA after all :-)

Say we have in.fq1.gz in.fq2.gz as paired, untrimmed fastq but demultiplexed (not demultiplexed can be handle as well but needs an extra command. We use the following:

 fastq2bam  -o /dev/stdout  in.fq1.gz in.fq2.gz | leeHomMulti -u --ancientdna -o /dev/stdout /dev/stdin  | network-aware-bwa/bwa bam2bam -n 0.01 -o 2 -l 16500  -g reference.fasta  - | samtools sort ...

fastq2bam is from my own BCL2BAM2FASTQ

leeHom is a specialized adapter trimmer+merger for ancient DNA and shorter DNA molecules. It is still to my knowledge the most accurate, see software repo: leeHom and our paper in NAR

network aware bwa is a nifty fork of bwa aln from a colleague in Leipzig which can eat BAM and spit out bam. It also considers in the insert size computation both the merged and paired reads, see software here

and samtools sort is the normal samtools sort, adapt your command line to account for the version of samtools.

I am biased but this is the cleanest workflow for ancient DNA that I know of. Everything is bam and what goes it is what goes out. You can even instruct leeHom to keep the original reads+QC score with the --keepOrig, they will exist as QC failed reads and will be ignored but you can find them in the BAM file.

Hope this helps!

ADD COMMENTlink modified 9 days ago • written 11 days ago by Gabriel R.2.6k

thanks for your response and actually I have to go details of what you sent me because I am not a bioinformatician, so trying to understand all the details.

ADD REPLYlink written 11 days ago by haghverdi_ac0

ok! let me know if you need help. network-aware-bwa can be a pain to install but I managed on a lot of systems.

ADD REPLYlink written 11 days ago by Gabriel R.2.6k
gravatar for haghverdi_ac
11 days ago by
haghverdi_ac0 wrote:

thank you very much for your immediate response, since it is aDNA, we are trying to be as sensitive as possible and contain more info.

ADD COMMENTlink written 11 days ago by haghverdi_ac0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour