Transcript length from ensembl database
1
0
Entering edit mode
3.8 years ago
Dominik • 0

Hi everyone!

I am new in bioinformatics field and my problem is that I have a RNA-seq data, which includes transcript IDs and gene IDs (from enasmbl database) and I wanted to check transcript lengths for every transcript. I was wondering if it is possible using any Python/R tools and if so could anyone explain rather simply how to perform it? Thank you for any help!

PS. I want to do thtat, because I have FPKM values, which I want to convert back to raw counts in order to perform differential expression analysis using DESeq2.

RNA-seq transcriptomics • 2.9k views
ADD COMMENT
2
Entering edit mode
3.8 years ago
GenoMax 154k

You can use BioMart (either via web) or by biomaRt package.

Via Web: https://www.ensembl.org --> Human (will use this as example) --> BioMart at top --> Ensembl Genes --> Human Genes --> Select "Attributes" in left column --> Expand "Gene" (+ sign) in right column --> Select (Check mark) Gene Stable ID/Transcript Stable ID/Gene name/Transctipt length --> Click on count button in top left corner --> 68005* as of today. --> Click on "Results" --> Export All results to file --> Select format you want (CSV/TSV) --> Click "Go"

If you want to see the results only for your ID's then you can limit the results by uploading your ID's using Filters option --> Gene section before you start selecting Attributes following directions above.

  • Edit: 86364 as of July 2025
ADD COMMENT
0
Entering edit mode

I am also working with ensmble transcripts and I need length for every transcripts. Is there any downloadable file which contains all ensemble transcript length?

ADD REPLY
1
Entering edit mode

I don't think there is a file readily available to download. You can either follow the BioMart instructions above or use seqkit (LINK) to get that information from Ensembl cDNA file found here (for human as example).

complete fasta names

$ seqkit fx2tab --length --name Homo_sapiens.GRCh38.cdna.all.fa 

OR

just the accession numbers

$ seqkit fx2tab --length --name -i Homo_sapiens.GRCh38.cdna.all.fa 

You will get the lengths in column 2

ENST00000573487.5       1593
ENST00000233047.9       1717
ENST00000572258.5       584
ENST00000574092.1       581
ENST00000261388.7       1812
ENST00000451578.6       1378
ENST00000572599.5       713
ENST00000577162.1       305
ENST00000573688.1       1495
ADD REPLY

Login before adding your answer.

Traffic: 3522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6