Question: How One Can Separate Noncoding From Coding Ucsc And Ensembl Transcripts
0
gravatar for biorepine
6.3 years ago by
biorepine1.4k
Spain
biorepine1.4k wrote:

Dear biostars,

Do you know how one can separate noncoding from coding UCSC and ENSEMBL transcripts ? In general I use NR_* to identify noncoding and NM_* to identify protein coding genes in Refseq database.

Thanx in advance

ensembl refseq code ucsc • 4.6k views
ADD COMMENTlink modified 6.3 years ago by Pierre Lindenbaum122k • written 6.3 years ago by biorepine1.4k

what is your input ? a list of knownGene identifiers ? a list of ENSGxxxxxxx ?

ADD REPLYlink written 6.3 years ago by Pierre Lindenbaum122k

yes ENS* in case of ENSEMBL and ucsc.* in case of UCSC.

ADD REPLYlink written 6.3 years ago by biorepine1.4k

ucsc.* ? can you give one example please.

ADD REPLYlink written 6.3 years ago by Pierre Lindenbaum122k
3
gravatar for Pierre Lindenbaum
6.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

for the ucsc/knownGene, you can select the transcripts having cdsStart==cdsEnd

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select name,chrom,cdsStart,cdsEnd from knownGene where cdsStart=cdsEnd limit 10'
+------------+-------+----------+--------+
| name       | chrom | cdsStart | cdsEnd |
+------------+-------+----------+--------+
| uc001aaa.3 | chr1  |    11873 |  11873 |
| uc010nxr.1 | chr1  |    11873 |  11873 |
| uc009vis.3 | chr1  |    14361 |  14361 |
| uc009vit.3 | chr1  |    14361 |  14361 |
| uc009viu.3 | chr1  |    14361 |  14361 |
| uc001aae.4 | chr1  |    14361 |  14361 |
| uc001aah.4 | chr1  |    14361 |  14361 |
| uc009vir.3 | chr1  |    14361 |  14361 |
| uc009viq.3 | chr1  |    14361 |  14361 |
| uc001aac.4 | chr1  |    14361 |  14361 |
+------------+-------+----------+--------+
ADD COMMENTlink written 6.3 years ago by Pierre Lindenbaum122k

So if I change cdsStart!=cdsEnd, does it print only coding genes ? Thanks

ADD REPLYlink written 4.5 years ago by biorepine1.4k
2
gravatar for Ashutosh Pandey
6.3 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

You can download ENSEMBL annotation from Biomart (http://useast.ensembl.org/biomart/martview/) , you can select Gene Biotype information that will tell you if a given transcript is protein-coding or non-coding.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Ashutosh Pandey11k

Thanx but any idea regarding UCSC transcripts ?

ADD REPLYlink written 6.3 years ago by biorepine1.4k
1

You can input UCSC IDs into BioMart.

There's a help video on BioMart here:

<iframe></iframe>

ADD REPLYlink written 6.3 years ago by Emily_Ensembl19k

Using Ensembl biomart, is it possible to find gene biotype on the opposite(antisense) strands, especially if it is coding and non-coding.

ADD REPLYlink written 5.8 years ago by hbr7210

I'm afraid I don't understand your question. Are you looking to find out if there's a gene on the opposite strand of your gene of interest and find out what its biotype is? If so, there isn't a way to do that using BioMart. That would be a job for the Perl API.

ADD REPLYlink written 5.8 years ago by Emily_Ensembl19k

I am sorry that my question was not clear. But still you got it right - yes I am indeed interesting in looking on the non coding genes on the opposite strand of my gene of interest. I will look into Perl API. Thanks again.

ADD REPLYlink written 5.8 years ago by hbr7210

There's an online course here. http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop

ADD REPLYlink written 5.8 years ago by Emily_Ensembl19k

Great !!! thanks so much. Currently i am trying to see the reverse strand information from the Blat output, if i can't then i have to switch to perl api.

ADD REPLYlink written 5.8 years ago by hbr7210

I think hundreds of ENSEMBL lincRNAs annotations were wrong. (They should be intergenic and in principle they should not overlap with any known coding transcript irrespective of strand direction)

ex:

chr8    33998976    34060498    NM_001177589_Gm3985    0    -    chr8    33998977    34060498    lincRNA_ENSMUSG00000079070_ENSMUST00000132101_Gm3985    0    -
chr8    33998976    34060498    NM_001177589_Gm3985    0    -    chr8    34000947    34052954    lincRNA_ENSMUSG00000079070_ENSMUST00000180220_Gm3985    0    -
chr8    48265402    48437702    proteinCoding_ENSMUSG00000038143_8_Stox2    0    -    chr8    48379626    48531716    lincRNA_ENSMUSG00000097922_ENSMUST00000181417_AC102862.2    0    -
ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by biorepine1.4k

I'm afraid you've got that wrong. lincRNAs can be anywhere in the genome and can overlap coding genes in both directions.

See the wikipedia article on lincRNAs.

ADD REPLYlink written 6.3 years ago by Emily_Ensembl19k

Please see the wiki again.

Long intergenic non-coding RNAs (lincRNA) : "Intergenic" refers to long non-coding RNAs that are transcribed from non-coding DNA sequences between protein-coding genes"

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by biorepine1.4k

Whoops yes. I googled lincRNA for a definition and didn't notice that the wiki page wasn't actually called lincRNA.

The Ensembl definition can be found here:

http://www.ensembl.org/info/docs/genebuild/ncrna.html

We include RNAs that overlap other genes by <35%

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Emily_Ensembl19k

Wiki is right. The original definition came from here http://www.ncbi.nlm.nih.gov/pubmed/19182780. May be you ENSEMBL guys need to change the name from lincRNA to lncRNA. :)

ADD REPLYlink written 6.3 years ago by biorepine1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2297 users visited in the last hour