Question

How to align Trimmomatic unpaired reads with BWA?

0

Entering edit mode

9.0 years ago

mcff23 ▴ 60

Hi everyone!

I have filtered the adapters from my Illumina PE reads with Trimmomatic. This was the output (as I expected):

sample.R1.trimmed.fastq
sample.R2.trimmed.fastq
sample.R1.unpaired.fastq
sample.R2.unpaired.fastq

Then I aligned the trimmed.fastq pair with BWA just fine. But when I tried to align the unpaired reads I got this:

[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (4, 1, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "HWI-1KL178:67:HAE0RADXX:1:1101:2363:2000", "HWI-1KL178:67:HAE0RADXX:1:1101:11567:2000"

This is the command line:

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq ${SAMPLE}.R2.unpaired.fastq > ${i}.sam

My goal is to align trimmed and unpaired files separately because BWA do not support them together.

Thanks in advance!

Monica

Trimmomatic BWA Unpaired-reads • 9.5k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by mcff23 ▴ 60

0

Entering edit mode

5.1 years ago

drake.edwards • 0

If your unpaired reads are being generated by Trimmomatic's pallindromic mode (i.e. If forward and reverse reads end up containing the same sequence after trimming adapters), try using the "keepBothReads" function of ILLUMINACLIP

ADD COMMENT • link 5.1 years ago by drake.edwards • 0

Ram · Accepted Answer · 2015-04-30

6

Entering edit mode

9.0 years ago

Istvan Albert 100k

Run each unpaired data separately.

bwa/bin/bwa mem -aM -t 6 ${REF_BWA_INDEX}/genome.fa ${SAMPLE}.R1.unpaired.fastq >R1.unpaired.sam
..

Be careful with combining paired and unpaired data.

Information gleaned from a read pair usually cannot (should not) be combined with that obtained from two unpaired reads. That is because a paired read provides measurements from the same DNA fragment that is measured (sequenced) twice, whereas unpaired reads measure different DNA fragments.

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by Istvan Albert 100k

2

Entering edit mode

Just a note that the latest bwa-mem supports this:

(seqtk mergepe sample.R?.trimmed.fastq; cat sample.R?.unpaired.fastq) | bwa mem -p -

i.e., you can merge paired and unpaired reads in one stream, as long as paired reads are next to each other.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by lh3 33k

0

Entering edit mode

Thanks Istvan for your quick response!

I am kind of lost. My main goal here is to call variants, what do yo suggest me to do with these unpaired files once I aligned them separately? I was going to merge them with the trimmed ones and then call the variants...

Do I have to take them into account or I should only use the trimmed ones?

Thanks!

Monica

ADD REPLY • link 9.0 years ago by mcff23 ▴ 60

2

Entering edit mode

Check the documentation of the variant caller for information on whether it handles mixed content. We usually discard the unpaired reads to keep things simple but typically these are no more than a few percent of data - won't actually affect the results.

ADD REPLY • link updated 21 months ago by Ram 43k • written 9.0 years ago by Istvan Albert 100k

0

Entering edit mode

Hi Istvan,

Would you please give a general number for "a few percent"? I filtered out 8% unpaired reads. Will this amount of data loss affect the downstream analysis?

Thank you!

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.8 years ago by Emma ▴ 10

0

Entering edit mode

8% is not all that much but then it all depends how much data do you have left. The general rule is that it is best to get rid of bad data than to try to salvage it. in my opinion better data even if it is fewer is more desirable than salvaged data.

That is because errors rarely come isolated - we may think that we were able fix all that by trimming off the bad bases but perhaps there were more reasons that drove those errors in some regions of the flowcell and even the data that looks reliable is not.

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.8 years ago by Istvan Albert 100k