Question: Number of reads mapped to genome
0
gravatar for vimlakany
2.0 years ago by
vimlakany0
vimlakany0 wrote:

Hi, while calculating RPKM, how to get the number of reads mapped to genome. The total read counts is 11851490
I have tried samtools flagstat file.bam, the result is

12955438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
12255658 + 0 mapped (94.60%:-nan%)
12955438 + 0 paired in sequencing
6477719 + 0 read1
6477719 + 0 read2
11999942 + 0 properly paired (92.62%:-nan%)
12234952 + 0 with itself and mate mapped
20706 + 0 singletons (0.16%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

from the above result
1) which value should i take for number of reads mapped to genome, while calculating RPKM.
2) The total read counts is 11851490 and how does it increase to 12955438, 12255658,

rna-seq • 924 views
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by vimlakany0
2
gravatar for Devon Ryan
2.0 years ago by
Devon Ryan86k
Freiburg, Germany
Devon Ryan86k wrote:

Please do samtools view -c -F 256 -f 66 file.bam and use the number it outputs as the number of fragments. You can then use that for the FPKM (not RPKM, since you have a paired-end dataset) calculation.

The reason there are more entries than original reads in the BAM file is due to secondary alignments.

As an aside, make sure you have a good reason to use RPKMs/FPKMs, since for the most part they should be avoided.

ADD COMMENTlink written 2.0 years ago by Devon Ryan86k

Thank you. can you please tell me the difference between 12255658 + 0 mapped (94.60%:-nan%) and 12955438 + 0 paired in sequencing

ADD REPLYlink written 2.0 years ago by vimlakany0

12955438 is the number of entries, whether aligned or not. The other number is the percent aligned.

ADD REPLYlink written 2.0 years ago by Devon Ryan86k

so while calculating RPKM, will it be correct or meaningful if i take total number of reads mapped to genome as 12255658?
can you please tell me the difference between 11999942 + 0 properly paired (92.62%:-nan%) and 12234952 + 0 with itself and mate mapped.
What is the difference between mapped reads and properly paired reads?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by vimlakany0

Using 12255658 would yield values that are artificially small. The two numbers you referenced are for proper pairs and that plus discordant pairs (e.g., wrong relative orientation).

ADD REPLYlink written 2.0 years ago by Devon Ryan86k

Since you asked:

  • Mapped reads are reads that found a match on the reference sequence given the allowed mismatches / indels and all other restraints that you applied

  • Proper pairs are pairs of reads that both map and are within the insert size (which is a property of the sequencing library that you should know / have received with the data / have inferred by the TLEN field of the bam file resulting from the alignment of a subset of reads

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Macspider2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1626 users visited in the last hour