Question: Number of reads mapped to genome
0
gravatar for vimlakany
12 months ago by
vimlakany0
vimlakany0 wrote:

Hi, while calculating RPKM, how to get the number of reads mapped to genome. The total read counts is 11851490
I have tried samtools flagstat file.bam, the result is

12955438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
12255658 + 0 mapped (94.60%:-nan%)
12955438 + 0 paired in sequencing
6477719 + 0 read1
6477719 + 0 read2
11999942 + 0 properly paired (92.62%:-nan%)
12234952 + 0 with itself and mate mapped
20706 + 0 singletons (0.16%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

from the above result
1) which value should i take for number of reads mapped to genome, while calculating RPKM.
2) The total read counts is 11851490 and how does it increase to 12955438, 12255658,

rna-seq • 573 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by vimlakany0
2
gravatar for Devon Ryan
12 months ago by
Devon Ryan73k
Freiburg, Germany
Devon Ryan73k wrote:

Please do samtools view -c -F 256 -f 66 file.bam and use the number it outputs as the number of fragments. You can then use that for the FPKM (not RPKM, since you have a paired-end dataset) calculation.

The reason there are more entries than original reads in the BAM file is due to secondary alignments.

As an aside, make sure you have a good reason to use RPKMs/FPKMs, since for the most part they should be avoided.

ADD COMMENTlink written 12 months ago by Devon Ryan73k

Thank you. can you please tell me the difference between 12255658 + 0 mapped (94.60%:-nan%) and 12955438 + 0 paired in sequencing

ADD REPLYlink written 12 months ago by vimlakany0

12955438 is the number of entries, whether aligned or not. The other number is the percent aligned.

ADD REPLYlink written 12 months ago by Devon Ryan73k

so while calculating RPKM, will it be correct or meaningful if i take total number of reads mapped to genome as 12255658?
can you please tell me the difference between 11999942 + 0 properly paired (92.62%:-nan%) and 12234952 + 0 with itself and mate mapped.
What is the difference between mapped reads and properly paired reads?

ADD REPLYlink modified 12 months ago • written 12 months ago by vimlakany0

Using 12255658 would yield values that are artificially small. The two numbers you referenced are for proper pairs and that plus discordant pairs (e.g., wrong relative orientation).

ADD REPLYlink written 12 months ago by Devon Ryan73k

Since you asked:

  • Mapped reads are reads that found a match on the reference sequence given the allowed mismatches / indels and all other restraints that you applied

  • Proper pairs are pairs of reads that both map and are within the insert size (which is a property of the sequencing library that you should know / have received with the data / have inferred by the TLEN field of the bam file resulting from the alignment of a subset of reads

ADD REPLYlink modified 12 months ago • written 12 months ago by Macspider1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour