Question: Obtaining All Cds Sequences (I.E. All Spliced Exon Variants) From Ucsc
gravatar for Max
7.0 years ago by
Max140 wrote:

In order to estimate dN/dS for various genes, I need the entire coding sequence. I have been working with the list of cds exon sequences provided from the UCSC Tables browser for the human reference genome, and one of the problems that I'm facing is that if I attempt to concatenate them into a single sequence for PAML, HYPHY, etc, I have to deal with the fact that each exon is on a potentially different reading frame.

Therefore, I need to know if there is some efficient means of extracting the entire cds sequence with the exons already concatenated and adjusted into a single 0 to modulus 3 reading frame. I don't see a cds option as such listed (although the tables provide coordinates for cds Start/End). In other words, I need a complete cds of every alternative splicing of exons, so that each cds can be "read" from start to end in a single frame.

I seem to remember that UCSC could return the complete cds for each alternative splicing as well as just giving the list of exons, but I don't see this option listed. The closest that I've been able to find is to restrict the list of returned exons to those that appear in the coding sequences.

cds ucsc • 2.4k views
ADD COMMENTlink modified 7.0 years ago by Pierre Lindenbaum129k • written 7.0 years ago by Max140
gravatar for Pierre Lindenbaum
7.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

not sure if the ucsc will allow you to curl all the mRNA, but the following script seems to work:

curl -s "" |\
gunzip -c |\
cut -d '      ' -f 1 |\
while read F
       curl  -s "${F}&db=hg19" |\
                sed -s 's%<[/]*\(PRE\|TT\)>%%g' 
ADD COMMENTlink written 7.0 years ago by Pierre Lindenbaum129k

Is the mRNA on UCSC primary transcript, or post intron-splicing cds?

ADD REPLYlink written 7.0 years ago by Max140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1017 users visited in the last hour