how to caculate effective length for RNA sequencing?
2
1
Entering edit mode
21 months ago
xiaoguang ▴ 50
Is there any R packages or programs could help me to calculate **effective length** of every isoform or gene?

We know the importance of  effective length when we calculate TPM for isoforms or genes . but we can not got it from featurecount program. Salmon can get it ,but it is only for isoforms.

rna-seq RNA-Seq R • 1.4k views
3
Entering edit mode
20 months ago
Gordon Smyth ★ 2.5k

If you are doing a gene-level analysis, then I recommend you simply use the gene length values that come from featureCounts.

The concept of effective length is for transcripts rather than genes and is not so clearly relevant for an analysis of gene-level counts from featureCounts. However, if you wanted to modify the gene lengths in a way similar to how kallisto and Salmon modify the transcript lengths, it would simply be gene-length minus the average read length.

0
Entering edit mode
21 months ago

You cannot do that. The effective length requires isoform quantification (afterwards summing to get gene expression) which featureCounts cannot do. For featureCounts you will have to settle on FPKM normalization.

Why not just quantify with Salmon in the first place?

0
Entering edit mode

If we use featurecount, we can get gens quantification beacuse of aligning sequences to genome, but we use Salmon, we only get isoforms quantification.

however,TPM is the percentage of FPKM,Are they different？

0
Entering edit mode

To get gene-level quantification from Salmon you simply sum the counts/TPM for all isoforms annotated as belonging to the same gene. This can fx easily be done with the tool tximport as described in the vignette. As an alternative you can use IsoformSwitchAnalyzeR by first using the "importIsoformExpression" and afterwards using isoformToGeneExp - which supports providing the gene-isoform relationship as a GTF file. You can read more about the advantages and disadvantages of such approaches here.

0
Entering edit mode

And no TPM are not the percentages of FPKM (those are typically called PSI or Isoform-fraction (IF)). TPM is an abundance measure just like FPKM except it is better when using RNASeq data since it provides more accurate abundance measures. You can read more about RNASeq abundance units here.