Question: Number of unmapped reads after filtering issue
1
gravatar for max_19
3 months ago by
max_1950
max_1950 wrote:

Hi all,

I have a SAM file from paired end sequencing data, I wrote a short script (below) to filter some reads (reads that map to bacteria):

awk 'FNR==NR { a[$1]; next } !($3 in a)' contigs_bacteria.txt <(samtools view -h aligned_reads.sam) > aligned_reads_no_bacteria.sam

This works fine, however when I check count of unmapped reads in the original sam file (aligned_reads.sam) vs. the filtered one, the # of unmapped reads are different:

samtools view -c -f 4 aligned_reads.sam 
316971

samtools view -c -f 4 aligned_reads_no_bacteria.sam 
293621

Contigs_bacteria.txt is just a text file that looks like this, and I use these values to filter the sam file, as I know reads with these values come from bacteria.

201
908
925
881
895
921
889
913
875
922
923
915
878
910

I guess it is also filtering out some unmapped reads in the process. or maybe this is due to the fact that they are paired and if only one mate meets the criteria, that pair is lost. I do not want to lose the unmapped reads as they can potentially be reassembled. Any ideas/help is appreciated.

sequencing sam samtools • 291 views
ADD COMMENTlink modified 3 months ago by finswimmer11k • written 3 months ago by max_1950
4
gravatar for finswimmer
3 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello max_19 ,

if one read of a pair can be mapped and aligned but the other one not, than the one that can not be mapped is assigned to the same chromosome/sequence name like the one that can be mapped, but its flagged as unmapped. So if you exclude chromosomes/sequence names it is normal, that the number of unmapped reads decreases.

fin swimmer

ADD COMMENTlink written 3 months ago by finswimmer11k

Thanks for your response, it helps a lot! Just a small follow-up, what if one read in a pair is mapped to one chromosome/sequence name, and the other read is mapped to a different chromosome for example...Would this ever happen? and in general should two pairs in a read always remain together when we filter these files? Thank you.

ADD REPLYlink written 3 months ago by max_1950
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour