Filtering low quality score bases from BAM file
1
0
Entering edit mode
3.6 years ago

Hello All, Is there any way to filter low quality score bases from bam files?

I am planning to perform variant analysis and I have sorted BAM files after performing MarkDuplicate and Addreadgroups steps. After performing BQSR I came to know that many bases in my data have quality score less than 20. I want to filter low quality bases without having to perform all steps again from the beginning.

RNA-Seq alignment Quality Score • 3.4k views
ADD COMMENT
1
Entering edit mode

Just a suggestion: MarkDuplicate and AddReadGroups are not "steps", but functions of a program (I guess it's picard). Not everyone removes duplicates, and most of the times you don't need read groups unless you're using GATK. This suggests me that you're using someone else's variant calling pipeline and trying to make sense of it. If you ask a question here, you can't assume that we use the same workflow / pipeline as you, hence try to be more specific when you describe your problem ;)

ADD REPLY
1
Entering edit mode

Alright, this is my first time using any tools and platform. I will keep that in my mind.

ADD REPLY
0
Entering edit mode

People have been doing this for a while, the software is not stupid. Variant callers understand that low quality bases exist, they will take that into account when making their calls. Besides, 99% of your bases with a quality of 20 will be accurate. That's a lot of sound data to throw away.

ADD REPLY
1
Entering edit mode
3.6 years ago

Filtering read-mapping records is usually done with tools like samtools view:

http://www.htslib.org/doc/samtools-view.html

Among the options, there is -q <N> which discards reads with mapping quality below N. However, it isn't easy (and perhaps not possible) to remove single positions from it, just because of how these files work (they are read-based, not positition-based).

I'd suggest you to filter mapping records thoroughly (for example removing secondary alignments with -F 0x0100 and low-quality mapping records with -q 20). Then call your variants and filter those by quality afterwards (INFO field of a VCF file).

ADD COMMENT

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6