Get all the translation start site for human genome
2
0
Entering edit mode
9.0 years ago
Ming Tommy Tang ★ 3.9k

Hi there,

I want to get all the positions of the translation start sites for the human genes. Note it is different from the transcription start sites (TSSs). protein translates at the codon ATG.

For all TSSs, GENCODE gff file and refGENE file from UCSC have the information.

my questions is, can I look at every gene sequences start from the TSS to the TES(transcription end site), and look for the first ATG, and assume it is the translation start site.

Alternative start codon is rare in eukaryotic genomes, but may still exist.

What's your suggestions?

Thanks,
Ming

translation gene • 4.7k views
ADD COMMENT
3
Entering edit mode
9.0 years ago
mark.ziemann ★ 1.9k

Hi Ming. ENSEMBL GTF files have the position of the "CDS" (coding sequence) in addition to "gene" and "transcript". It should be straightforward to extract the start of the CDS belonging to exon 1 of each protein coding gene. Just be careful to account for orientation of genes on the minus strand.

ADD COMMENT
1
Entering edit mode

Thanks! Mark. by the way, I found your blog very helpful :)

ADD REPLY
0
Entering edit mode

Actually, the GENCODE gtf file has a feature for start_codon. Even nicer for me :)

ADD REPLY
0
Entering edit mode
9.0 years ago
michael.ante ★ 3.8k
Hi Ming, You can also use Ensembl's biomart and UCSC's table browser to get this information. In biomart for instance you select next to gene and transcript id the transcript start. Cheers, Michael
ADD COMMENT
0
Entering edit mode

I want the translation start sites which are different from the transcription start sites.

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6