Length Of The Longest Available Cdnas For Gene Set
1
0
Entering edit mode
10.4 years ago
User 1933 ▴ 340

I would like to know what are the cDNA lengths for a gene set. for example, for 1 gene, I can go to ensembl.org and type LRP2 and see the longest length is 15808bp. I was wondering, if there is any R library / Python module / SQL query that I can use for ?

Also as a side question, does CDS Length equal to the length of cDNA ?!

I know the question, might sounds vague so please let me know if you need more explanation.

• 2.2k views
ADD COMMENT
0
Entering edit mode

If you are trying to get the total coding gene size in R/bioconductor have a look at Extract total non-overlapping exon length per gene with Bioconductor

ADD REPLY
0
Entering edit mode
10.4 years ago

The answer here depends a bit on what type of data you have, if had transcript sequences then you can create a blast database out of them, then user the blastdbcmd command to extract your sequences while labeling them via lengths (see the outfmt flag):

$ blastdbcmd -db ~/refs/16S/16SMicrobial -entry "all" -outfmt "%g %l" | head

would produce:

444303911 1492
343206245 1464
343206246 1454
343206230 1255

Where the first number is the accession number the second is the lenght of the sequence. You could then match the accession and sort by length.

If you only have genomic coordinates one way to do this would be to extract your transcripts with a command like bedtools getfasta if you have 12 column BED format or the gffread command distributed with cufflinks if you have gff files.

Then do the blast database formatting and query as above.

ADD COMMENT
0
Entering edit mode

thanks - can you elaborate how can I do it by having a HGNC id. I can get genomics coordinate from them. but I am not sure how to proceed

ADD REPLY
0
Entering edit mode

I recognize that this answer depends on one's background training and ability to run command line tools

download all transcripts as a fasta file, run makeblastdb on it to create the blast database, then use blastdbcmd as indicated above

ADD REPLY
0
Entering edit mode

if I want to do this analysis in the genome scale - from where I can download all transcripts ? thanks

ADD REPLY
0
Entering edit mode

Isn't there any public dataset to include the cDNA size of any gene ?

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6