Question: Difference between pair alignment rate and properly paired reads
0
gravatar for monteiro.s.rita
4.1 years ago by
United Kingdom
monteiro.s.rita0 wrote:

I have some RNA-Seq data (paired-end reads) which I have aligned using TopHat. This is how the align summary look like:

Left reads:
          Input     :  25671258
           Mapped   :  22149823 (86.3% of input)
            of these:   2005259 ( 9.1%) have multiple alignments (200624 have >20)
Right reads:
          Input     :  25671258
           Mapped   :  21866868 (85.2% of input)
            of these:   1977056 ( 9.0%) have multiple alignments (199383 have >20)
85.7% overall read mapping rate.

Aligned pairs:  21161013
     of these:   1903746 ( 9.0%) have multiple alignments
                  801300 ( 3.8%) are discordant alignments
79.3% concordant pair alignment rate.

When I run 

samtools flagstat accepted_hits.bam

This is the result:

66612729 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

66612729 + 0 mapped (100.00%:nan%)

66612729 + 0 paired in sequencing

33517595 + 0 read1

33095134 + 0 read2

34458604 + 0 properly paired (51.73%:nan%)

64054464 + 0 with itself and mate mapped

2558265 + 0 singletons (3.84%:nan%)

13192728 + 0 with mate mapped to a different chr

402500 + 0 with mate mapped to a different chr (mapQ>=5)

 

I don't understand why the percentage of pair alignment given by Tophat does not correspond to the percentage of properly paired reads. 

Besides this, I do find that in the bam file that are paired reads mapped in different chromosomes. 

Could you please help understand this.

 

rna-seq samtools bam paired-end • 2.6k views
ADD COMMENTlink modified 4.1 years ago by Kamil2.0k • written 4.1 years ago by monteiro.s.rita0
3
gravatar for Kamil
4.1 years ago by
Kamil2.0k
Boston
Kamil2.0k wrote:

I believe that the two tools report different numbers:

  • Tophat says you have 21,161,013 aligned pairs of reads.
  • Samtools says that your BAM file has 34,458,604 alignments with properly paired reads.

Tophat is counting the number of read pairs and samtools is counting the number of alignments. The reason the two numbers do not agree is because a single pair of reads can have more than one alignment in the BAM file. So, a single pair of reads is counted once by Tophat, but counted multiple times by samtools.

You might try checking the number of properly paired read pairs in your BAM file by counting unique read identifiers:

samtools view -f 2 file.bam | cut -f1 | sort -u | wc -l
ADD COMMENTlink modified 21 days ago by RamRS25k • written 4.1 years ago by Kamil2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour