Question: Filtering low quality score bases from BAM file
0
gravatar for tayyabaalvi82
26 days ago by
tayyabaalvi820 wrote:

Hello All, Is there any way to filter low quality score bases from bam files?

I am planning to perform variant analysis and I have sorted BAM files after performing MarkDuplicate and Addreadgroups steps. After performing BQSR I came to know that many bases in my data have quality score less than 20. I want to filter low quality bases without having to perform all steps again from the beginning.

ADD COMMENTlink modified 26 days ago by Macspider3.2k • written 26 days ago by tayyabaalvi820

Just a suggestion: MarkDuplicate and AddReadGroups are not "steps", but functions of a program (I guess it's picard). Not everyone removes duplicates, and most of the times you don't need read groups unless you're using GATK. This suggests me that you're using someone else's variant calling pipeline and trying to make sense of it. If you ask a question here, you can't assume that we use the same workflow / pipeline as you, hence try to be more specific when you describe your problem ;)

ADD REPLYlink written 26 days ago by Macspider3.2k

Alright, this is my first time using any tools and platform. I will keep that in my mind.

ADD REPLYlink written 26 days ago by tayyabaalvi820

People have been doing this for a while, the software is not stupid. Variant callers understand that low quality bases exist, they will take that into account when making their calls. Besides, 99% of your bases with a quality of 20 will be accurate. That's a lot of sound data to throw away.

ADD REPLYlink written 26 days ago by swbarnes28.9k
0
gravatar for Macspider
26 days ago by
Macspider3.2k
Vienna - BOKU
Macspider3.2k wrote:

Filtering read-mapping records is usually done with tools like samtools view:

http://www.htslib.org/doc/samtools-view.html

Among the options, there is -q <N> which discards reads with mapping quality below N. However, it isn't easy (and perhaps not possible) to remove single positions from it, just because of how these files work (they are read-based, not positition-based).

I'd suggest you to filter mapping records thoroughly (for example removing secondary alignments with -F 0x0100 and low-quality mapping records with -q 20). Then call your variants and filter those by quality afterwards (INFO field of a VCF file).

ADD COMMENTlink written 26 days ago by Macspider3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1458 users visited in the last hour