Question: Alignment in WES
0
gravatar for kanika.151
16 days ago by
kanika.15180
Italy
kanika.15180 wrote:

Hi All,

I have a weird question.

I have been running a WES pipeline which has been made by someone else. And, they decided to use trimmomatic for the adapter trimming step which produces both:

*.paired.fastq.gz and *.unpaired.fastq.gz

And, these files are processed by BWA-MEM one by one and then the bam files are merged into one by using Picard merge. Apparently, this step was added because it increases the coverage of variants and the sequencing coverage.

My question: In your opinion, is it wise to merge paired and unpaired bam files? Do you think it will have an adverse effect on the variant quality? Can I skip this step and only use paired files?

ADD COMMENTlink written 16 days ago by kanika.15180

How many % of reads are unpaired?

ADD REPLYlink written 16 days ago by ATpoint42k

I do not have an exact figure but not more than 1%

ADD REPLYlink written 16 days ago by kanika.15180

Then discard them. Depending on what you might do in the future some tools expect perfectly synchronized BAM files (so all reads paired), so that would require additional processing, and I doubt that 1% of reads really helps you. If this increases variant count then I would rather assume that the additional reads are low quality and calls are false-positive. Think about it, even with 100M reads that would be 1M more or less, cannot really see that this would increase confidence. I mean, there was a reason one read of the pair was discarded...

ADD REPLYlink modified 15 days ago • written 15 days ago by ATpoint42k

It is interesting to know why the pipeline is designed in this way. What is the benefit of doing this? It may be designed to deal with a very specific situation. For sure it is not wise to include *unpared.fastq.gz into the mapping step for variant discovery at least in clinical related analysis. Also, I have doubt that including low quality reads increases your power to detect a variant.

ADD REPLYlink modified 16 days ago • written 16 days ago by Hamid Ghaedi780

Yes, I was told that unpaired single read good quality reads might increase the coverage of a variant and increase the coverage of the sample. Therefore, when unpaired fastq file is processed only high quality mapped reads are kept out of it. I just wanted to know if there are others who think this is a good explanation.

ADD REPLYlink written 15 days ago by kanika.15180

Agreed with @ATpoint, suppose you have included high-quality unpaired read and made a BAM file, further processing would be needed to proceed with this file in your workflow. Indeed 1% would not add much sensetivity to your discovery.

ADD REPLYlink written 15 days ago by Hamid Ghaedi780
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2135 users visited in the last hour