Question

Interpreting flagstat output and normalizing reads to RPKM

0

Entering edit mode

21 months ago

Tonkatsu ▴ 30

Hi, I am having some trouble interpreting the output from samtools flagstat (I aligned my unmapped reads to GFP). Am i correct to assume 11971 is the total reads from my alignment? Is there a difference between the "total" row and "mapped" row? Also what are 21 reads in the secondary row?

Also how would I go about normalizing/converting this output to RPKM?

Thanks!

enter image description here

samtools RNA-seq • 530 views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 21 months ago by Tonkatsu ▴ 30

score 1 · Answer 1 · 2022-08-02

Look, there is no point spreading the actual question over three (or more) posts:

Interpreting flagstat output and normalizing reads to RPKM

What is retained from original vector construct/plasmid after RNA-sequencing?

Read count based on base pair differences

If your overexpressed transcript has any sequence difference to the endogenous transcripts that you can basically map it. That means you should include the transgene sequence (the sequence of the final transcript) into the reference and then quantify against. I would use salmon for this, and for it you would add the sequence to the reference transcriptome fasta file. Alternatively you could add the transgene as an extra "chromosome" to the reference genome if you prefer genome alignment, e.g with star. That way you eventually get a count matrix and normalization is then easy to calculate, e.g. in R with DESeq2::fpkm() or anything similar.