Question

Number of unmapped reads after filtering issue

1

Entering edit mode

5.3 years ago

max_19 ▴ 170

Hi all,

I have a SAM file from paired end sequencing data, I wrote a short script (below) to filter some reads (reads that map to bacteria):

awk 'FNR==NR { a[$1]; next } !($3 in a)' contigs_bacteria.txt <(samtools view -h aligned_reads.sam) > aligned_reads_no_bacteria.sam

This works fine, however when I check count of unmapped reads in the original sam file (aligned_reads.sam) vs. the filtered one, the # of unmapped reads are different:

samtools view -c -f 4 aligned_reads.sam 
316971

samtools view -c -f 4 aligned_reads_no_bacteria.sam 
293621

Contigs_bacteria.txt is just a text file that looks like this, and I use these values to filter the sam file, as I know reads with these values come from bacteria.

I guess it is also filtering out some unmapped reads in the process. or maybe this is due to the fact that they are paired and if only one mate meets the criteria, that pair is lost. I do not want to lose the unmapped reads as they can potentially be reassembled. Any ideas/help is appreciated.

sequencing sam samtools • 1.4k views

ADD COMMENT • link updated 5.3 years ago by finswimmer 16k • written 5.3 years ago by max_19 ▴ 170

score 4 · Accepted Answer · 2018-12-29

4

Entering edit mode

5.3 years ago

finswimmer 16k

Hello max_19 ,

if one read of a pair can be mapped and aligned but the other one not, than the one that can not be mapped is assigned to the same chromosome/sequence name like the one that can be mapped, but its flagged as unmapped. So if you exclude chromosomes/sequence names it is normal, that the number of unmapped reads decreases.

fin swimmer

ADD COMMENT • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Thanks for your response, it helps a lot! Just a small follow-up, what if one read in a pair is mapped to one chromosome/sequence name, and the other read is mapped to a different chromosome for example...Would this ever happen? and in general should two pairs in a read always remain together when we filter these files? Thank you.

ADD REPLY • link 5.3 years ago by max_19 ▴ 170