Filtering A Sam File For Quality Scores
2
5
Entering edit mode
12.0 years ago
Varun Gupta ★ 1.3k

Hi I have a sam file. The 5th column of the sam file tells us the quality score. In my sam alignment file there are reads which have quality score of 2. I would like to remove them from my sam file because i think these reads which have a low quality score are not that important in my analysis. I want to keep the headers of sam file too.

Any code or unix one liner would be appreciated.

Thanks for the help

Regards Varun

sam • 48k views
ADD COMMENT
13
Entering edit mode
12.0 years ago

You can use -q paramter, from the man page and provide a custom threshold to it

-q INT Skip alignments with MAPQ smaller than INT [0]

So, for a bam file the code would be samtools view -bq 2 file.bam > filtered.bam and for a sam, it is

output in bam

samtools view -bSq 2 file.sam > filtered.bam

output in sam

samtools view -Sq 2 file.sam > filtered.sam

Cheers

ADD COMMENT
2
Entering edit mode

Is it not possible get rid of the chaining for the quality filtering?

samtools view -bSq 2 file.sam > filtered.bam
ADD REPLY
0
Entering edit mode

I tried the code in a jiffy, and for sure missed a parameter, so had to put the pipe. Of course, it works, I'll edit again :)

ADD REPLY
0
Entering edit mode

thnks sukhdeep works smoothly cheers

ADD REPLY
0
Entering edit mode

No worries, could you also edit the question to something like Filtering a Sam file for quality scores Cheers

ADD REPLY
3
Entering edit mode
12.0 years ago

I wouldn't worry about removing them. If you are variant calling they will be filtered.

You can also use samtools view -f 3 to only include reads that mapped and have their mate. This will reduce the size of the SAM file if that is a concern.

if you really want to remove them.

awk '$5  > x {print}' your.sam

where x is the minimum value.

ADD COMMENT
4
Entering edit mode

the print is superflous, you can also do

awk '$5 > x || $1 ~ /^@/'

to retain the headers.

ADD REPLY
0
Entering edit mode

Thanks. I didn't know you could skip the print statement.

ADD REPLY
0
Entering edit mode

Hi I tried that, but using that command does not give the headers(does it??). I need the headers(starting with @) in my output as well

Regards

ADD REPLY
3
Entering edit mode

Yes, using awk or grep or something like that will usually strip the headers, though they can be added on again with samtools reheader. It's usually best to use samtools view where possible, and samtools view can filter on mapping quality, as you want it to.

ADD REPLY

Login before adding your answer.

Traffic: 2892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6