Interpreting flagstat output and normalizing reads to RPKM
1
0
Entering edit mode
21 months ago
Tonkatsu ▴ 30

Hi, I am having some trouble interpreting the output from samtools flagstat (I aligned my unmapped reads to GFP). Am i correct to assume 11971 is the total reads from my alignment? Is there a difference between the "total" row and "mapped" row? Also what are 21 reads in the secondary row?

Also how would I go about normalizing/converting this output to RPKM?

Thanks!

enter image description here

samtools RNA-seq • 530 views
ADD COMMENT
1
Entering edit mode
21 months ago
ATpoint 82k

Look, there is no point spreading the actual question over three (or more) posts:

Interpreting flagstat output and normalizing reads to RPKM

What is retained from original vector construct/plasmid after RNA-sequencing?

Read count based on base pair differences

If your overexpressed transcript has any sequence difference to the endogenous transcripts that you can basically map it. That means you should include the transgene sequence (the sequence of the final transcript) into the reference and then quantify against. I would use salmon for this, and for it you would add the sequence to the reference transcriptome fasta file. Alternatively you could add the transgene as an extra "chromosome" to the reference genome if you prefer genome alignment, e.g with star. That way you eventually get a count matrix and normalization is then easy to calculate, e.g. in R with DESeq2::fpkm() or anything similar.

ADD COMMENT

Login before adding your answer.

Traffic: 1906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6