Question

Alignment in WES

0

Entering edit mode

3.4 years ago

kanika.151 ▴ 130

Hi All,

I have a weird question.

I have been running a WES pipeline which has been made by someone else. And, they decided to use trimmomatic for the adapter trimming step which produces both:

*.paired.fastq.gz and *.unpaired.fastq.gz

And, these files are processed by BWA-MEM one by one and then the bam files are merged into one by using Picard merge. Apparently, this step was added because it increases the coverage of variants and the sequencing coverage.

My question: In your opinion, is it wise to merge paired and unpaired bam files? Do you think it will have an adverse effect on the variant quality? Can I skip this step and only use paired files?

trimmomatic alignment WES exome sequencing • 1.2k views

ADD COMMENT • link updated 3.3 years ago by Biostar 20 • written 3.4 years ago by kanika.151 ▴ 130

0

Entering edit mode

How many % of reads are unpaired?

ADD REPLY • link 3.4 years ago by ATpoint 81k

0

Entering edit mode

I do not have an exact figure but not more than 1%

ADD REPLY • link 3.4 years ago by kanika.151 ▴ 130

0

Entering edit mode

Then discard them. Depending on what you might do in the future some tools expect perfectly synchronized BAM files (so all reads paired), so that would require additional processing, and I doubt that 1% of reads really helps you. If this increases variant count then I would rather assume that the additional reads are low quality and calls are false-positive. Think about it, even with 100M reads that would be 1M more or less, cannot really see that this would increase confidence. I mean, there was a reason one read of the pair was discarded...

ADD REPLY • link 3.4 years ago by ATpoint 81k

0

Entering edit mode

It is interesting to know why the pipeline is designed in this way. What is the benefit of doing this? It may be designed to deal with a very specific situation. For sure it is not wise to include *unpared.fastq.gz into the mapping step for variant discovery at least in clinical related analysis. Also, I have doubt that including low quality reads increases your power to detect a variant.

ADD REPLY • link 3.4 years ago by Hamid Ghaedi 3.2k

0

Entering edit mode

Yes, I was told that unpaired single read good quality reads might increase the coverage of a variant and increase the coverage of the sample. Therefore, when unpaired fastq file is processed only high quality mapped reads are kept out of it. I just wanted to know if there are others who think this is a good explanation.

ADD REPLY • link 3.4 years ago by kanika.151 ▴ 130

0

Entering edit mode

Agreed with @ATpoint, suppose you have included high-quality unpaired read and made a BAM file, further processing would be needed to proceed with this file in your workflow. Indeed 1% would not add much sensetivity to your discovery.

ADD REPLY • link 3.4 years ago by Hamid Ghaedi 3.2k