Filtering A Sam File For Quality Scores
2
5
Entering edit mode
10.0 years ago
Varun Gupta ★ 1.2k

Hi I have a sam file. The 5th column of the sam file tells us the quality score. In my sam alignment file there are reads which have quality score of 2. I would like to remove them from my sam file because i think these reads which have a low quality score are not that important in my analysis. I want to keep the headers of sam file too.

Any code or unix one liner would be appreciated.

Thanks for the help

Regards Varun

sam • 39k views
12
Entering edit mode
10.0 years ago

You can use -q paramter, from the man page and provide a custom threshold to it

-q INT Skip alignments with MAPQ smaller than INT [0]

So, for a bam file the code would be samtools view -bq 2 file.bam > filtered.bam and for a sam, it is

output in bam

samtools view -bSq 2 file.sam > filtered.bam

output in sam

samtools view -Sq 2 file.sam > filtered.sam

Cheers

2
Entering edit mode

Is it not possible get rid of the chaining for the quality filtering?

samtools view -bSq 2 file.sam > filtered.bam

0
Entering edit mode

I tried the code in a jiffy, and for sure missed a parameter, so had to put the pipe. Of course, it works, I'll edit again :)

0
Entering edit mode

thnks sukhdeep works smoothly cheers

0
Entering edit mode

No worries, could you also edit the question to something like Filtering a Sam file for quality scores Cheers

3
Entering edit mode
10.0 years ago

I wouldn't worry about removing them. If you are variant calling they will be filtered.

You can also use samtools view -f 3 to only include reads that mapped and have their mate. This will reduce the size of the SAM file if that is a concern.

if you really want to remove them.

awk '$5 > x {print}' your.sam  where x is the minimum value. ADD COMMENT 3 Entering edit mode the print is superflous, you can also do awk '$5 > x || \$1 ~ /^@/'


0
Entering edit mode

Thanks. I didn't know you could skip the print statement.

0
Entering edit mode

Hi I tried that, but using that command does not give the headers(does it??). I need the headers(starting with @) in my output as well

Regards

3
Entering edit mode

Yes, using awk or grep or something like that will usually strip the headers, though they can be added on again with samtools reheader. It's usually best to use samtools view where possible, and samtools view can filter on mapping quality, as you want it to.