Question: Length Of The Longest Available Cdnas For Gene Set
0
gravatar for User 1933
5.4 years ago by
User 1933340
User 1933340 wrote:

I would like to know what are the cDNA lengths for a gene set. for example, for 1 gene, I can go to ensembl.org and type LRP2 and see the longest length is 15808bp. I was wondering, if there is any R library / Python module / SQL query that I can use for ?

Also as a side question, does CDS Length equal to the length of cDNA ?!

I know the question, might sounds vague so please let me know if you need more explanation.

• 1.3k views
ADD COMMENTlink modified 5.4 years ago by Istvan Albert ♦♦ 80k • written 5.4 years ago by User 1933340

If you are trying to get the total coding gene size in R/bioconductor have a look at Extract total non-overlapping exon length per gene with Bioconductor

ADD REPLYlink modified 5.4 years ago by Istvan Albert ♦♦ 80k • written 5.4 years ago by Irsan6.8k
0
gravatar for Istvan Albert
5.4 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

The answer here depends a bit on what type of data you have, if had transcript sequences then you can create a blast database out of them, then user the blastdbcmd command to extract your sequences while labeling them via lengths (see the outfmt flag):

$ blastdbcmd -db ~/refs/16S/16SMicrobial -entry "all" -outfmt "%g %l" | head

would produce:

444303911 1492
343206245 1464
343206246 1454
343206230 1255

Where the first number is the accession number the second is the lenght of the sequence. You could then match the accession and sort by length.

If you only have genomic coordinates one way to do this would be to extract your transcripts with a command like bedtools getfasta if you have 12 column BED format or the gffread command distributed with cufflinks if you have gff files.

Then do the blast database formatting and query as above.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Istvan Albert ♦♦ 80k

thanks - can you elaborate how can I do it by having a HGNC id. I can get genomics coordinate from them. but I am not sure how to proceed

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by User 1933340

I recognize that this answer depends on one's background training and ability to run command line tools

download all transcripts as a fasta file, run makeblastdb on it to create the blast database, then use blastdbcmd as indicated above

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 80k

if I want to do this analysis in the genome scale - from where I can download all transcripts ? thanks

ADD REPLYlink written 5.4 years ago by User 1933340

Isn't there any public dataset to include the cDNA size of any gene ?

ADD REPLYlink written 5.3 years ago by User 1933340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 640 users visited in the last hour