Question: Number of reads mapped to genome
0
gravatar for vimlakany
21 months ago by
vimlakany0
vimlakany0 wrote:

Hi, while calculating RPKM, how to get the number of reads mapped to genome. The total read counts is 11851490
I have tried samtools flagstat file.bam, the result is

12955438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
12255658 + 0 mapped (94.60%:-nan%)
12955438 + 0 paired in sequencing
6477719 + 0 read1
6477719 + 0 read2
11999942 + 0 properly paired (92.62%:-nan%)
12234952 + 0 with itself and mate mapped
20706 + 0 singletons (0.16%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

from the above result
1) which value should i take for number of reads mapped to genome, while calculating RPKM.
2) The total read counts is 11851490 and how does it increase to 12955438, 12255658,

rna-seq • 849 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by vimlakany0
2
gravatar for Devon Ryan
21 months ago by
Devon Ryan82k
Freiburg, Germany
Devon Ryan82k wrote:

Please do samtools view -c -F 256 -f 66 file.bam and use the number it outputs as the number of fragments. You can then use that for the FPKM (not RPKM, since you have a paired-end dataset) calculation.

The reason there are more entries than original reads in the BAM file is due to secondary alignments.

As an aside, make sure you have a good reason to use RPKMs/FPKMs, since for the most part they should be avoided.

ADD COMMENTlink written 21 months ago by Devon Ryan82k

Thank you. can you please tell me the difference between 12255658 + 0 mapped (94.60%:-nan%) and 12955438 + 0 paired in sequencing

ADD REPLYlink written 21 months ago by vimlakany0

12955438 is the number of entries, whether aligned or not. The other number is the percent aligned.

ADD REPLYlink written 21 months ago by Devon Ryan82k

so while calculating RPKM, will it be correct or meaningful if i take total number of reads mapped to genome as 12255658?
can you please tell me the difference between 11999942 + 0 properly paired (92.62%:-nan%) and 12234952 + 0 with itself and mate mapped.
What is the difference between mapped reads and properly paired reads?

ADD REPLYlink modified 21 months ago • written 21 months ago by vimlakany0

Using 12255658 would yield values that are artificially small. The two numbers you referenced are for proper pairs and that plus discordant pairs (e.g., wrong relative orientation).

ADD REPLYlink written 21 months ago by Devon Ryan82k

Since you asked:

  • Mapped reads are reads that found a match on the reference sequence given the allowed mismatches / indels and all other restraints that you applied

  • Proper pairs are pairs of reads that both map and are within the insert size (which is a property of the sequencing library that you should know / have received with the data / have inferred by the TLEN field of the bam file resulting from the alignment of a subset of reads

ADD REPLYlink modified 21 months ago • written 21 months ago by Macspider2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour