How to calculate the effictive gene length in R?
1
0
Entering edit mode
18 months ago
JACKY ▴ 140

I have several RNA-seq raw counts datasets that I want to normalize to TPM. I exctracted the transcript lengths from ensemble, this is for example the lengths for one of the datasets, and as you can see the gene names are the rownames:

enter image description here

Now according to this awesome article, it is better to use effective lengths rather than just the transcript lengths, which is the gene length - X + 1, where X is "mean of the fragment length distribution which was learned from the aligned read".

My question is how do I calculate that in R?

What exactly do they mean with this X, and is the gene length the same as the transcript length? I'm a bit confused.

r TPM normalization • 561 views
ADD COMMENT
3
Entering edit mode
18 months ago
ATpoint 82k

You don't, at least not starting from a count matrix. If you read the linked blog carefully you would see that calculation of effective gene length requires information such as about fragment length distribution which is not available from such a matrix. You need a dedicated tool to calculate that starting from the reads itself, popular options are salmon, kallisto, or RSEM. That having said, you would need to start from scratch with the fastq files. With only counts you cannot achieve that.

ADD COMMENT

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6