Retrieve One Coding Sequences For Each Gene
2
0
Entering edit mode
12.0 years ago

Dear all I want to retrieve all of coding sequences (CDS) related to a special organism (for example, human) but i don't want to have several coding sequences for each gene. in other words i want to have one CDS for each gene (canonical transcript of each protein-coding gene). i tried Biomart but it cant do it. if there is any way to fix my problem please inform me. i really need your help. thanks a lot in advance best regards

sequence data biomart retrieval • 2.8k views
ADD COMMENT
0
Entering edit mode
12.0 years ago
Leszek 4.2k

Usually people use the longest isoform. So get all transcripts from given species and keep only the longest for each gene.

ADD COMMENT
0
Entering edit mode

Dear Leszek Thanks for your help. but i am not familiar with Perl or language programming like this as well. if possible help me more about this. thanks again for your help regards

ADD REPLY
0
Entering edit mode

dear Leszek, i wanna ask u bout neighbour-joining tree (PHYLIP version) .. my OUTGROUP that i choose to be the root were always in INGROUP. why these happened? n wat should i do to get the outgroup for the tree?

ADD REPLY
0
Entering edit mode
12.0 years ago
Bioinfosm ▴ 620

You can download the refFlat file from UCSC and use the exon coordinates to sum them up to a coding length. As @Leszek mentioned, a common way is to take a union of all exons for a gene and generate 'the longest isoform'. Some scripting would be needed to generate such a file.

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ -> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz

ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6