samtools counting unique mapped reads
1
1
Entering edit mode
7.7 years ago
bitpir ▴ 240

Hi, I have a question between the 2 different ways of counting unique mapped reads. I used paired-end sequencing and mapped the reads to hg19. My library size is ~300bp and am using 2*151bp paired end sequencing. - my total reads from the below calculation is 1473870

samtools view -c myfile.sam
  • my total mapped read is 723134. I understand that this includes multi-mapped reads

    samtools view -S -F0x4 -c myfile.sam

  • when I try to calculate the number of unique reads using the command below, I get 338787.

    samtools view -F 0x4 myfile.sam | cut -f 1 | sort | uniq | wc -l

  • I also compared the another way of calculating uniquely mapped reads. Here, I get 392467.

    samtools view -S -q 1 -c myfile.sam

I understand that the flag 1 is for paired end reads but not quite sure which number (338787 vs 392467) is the right uniquely mapped reads, which I define it as the number of reads that are mapped the most accurately and are mapped only once to the reference genome.

Thanks a lot for the help!

samtools read counts • 10k views
ADD COMMENT
0
Entering edit mode

I realize this is an old thread, but in case anyone else lands here like I did:

1) It looks like the -S option is not needed (anymore?), samtools auto-detects the input format.

2) -q 1 is not a flag for paired end reads, that would be -f 1. -q 1 returns any matches with a MAPQ score >= 1. This means different things for different alignment programs, so use with caution...

I get the impression that -c does not take unique sequences into account, from the command line help:

-c    print only the count of matching records

This sounds to me like it returns a count of all records, even if there are duplicates.

ADD REPLY
0
Entering edit mode
7.7 years ago

If you want the number of reads aligned to only one spot then you'll have to find out how (and if) your aligner denotes that and filter accordingly. Since this is rarely a useful thing to know, there's no standard way to encode it.

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6