Question: UCSC genes annotation of long non-coding RNAs in human
0
gravatar for sasa_k
5.0 years ago by
sasa_k0
Austria
sasa_k0 wrote:

Dear everyone!

I desperately try to find a way of getting a list of all the lncRNAs annotated in UCSC (UCSC genes). 

Each lncRNA gene in UCSC databases is marked as a lncRNA, but to my knowledge there is no separate table/file available for download.

For sure plenty of people working in the lncRNA field faced this and should have a simple answer or maybe could even share the annotation file??

Thank you very very much!!

alexandra

ADD COMMENTlink modified 2.6 years ago by Shicheng Guo7.5k • written 5.0 years ago by sasa_k0

Why not just download the GTF file and then just use awk (awk '{if($2=="lincRNA") print $0}' original.gtf > filtered.gtf, works for the Ensembl annotation, but you can always use grep if UCSC doesn't use the same format) to filter by entries annotated to be lncRNAs?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Devon Ryan90k

Dear Devon Ryan,

Thanks a lot for your reply!

The problem is that the GTF file I get from UCSC doesn't have much information inside. (You are right that Ensembl gives a nice annotation and I used grep for GENCODE annotation files to subdivide them into snoRNAs etc). 

However, when I download from the UCSC Table Browser - group: Genes and Gene Prediction Tracks - track: UCSC Genes - table: knownGene, it looks like that 

chr1    hg19_knownGene  exon    11874   12227   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1    hg19_knownGene  exon    12613   12721   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1    hg19_knownGene  exon    13221   14409   0.000000        +       .       gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";

Do you maybe know if I should download a different table from the UCSC Browser?

Thanks!

Alexandra

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by sasa_k0

I don't know if UCSC has one available for the current human annotation (hg38). You can download the track for hg19, if that's what you're using (it's the "lincRNA Transcripts" track).

ADD REPLYlink written 5.0 years ago by Devon Ryan90k

Yes, I use hg19, but the  "lincRNA Transcripts" is the track made from Cabili et al, 2011 lncRNAs.

It is not "tidy" and differs from lncRNAs that are shown by UCSC Genes track.

And I actually thought that UCSC does some validation prior to including transcripts into UCSC genes. 

Sorry for so much details. :) 

Alexandra

 

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by sasa_k0
1

Yet another reason to use the Ensembl annotation :)

ADD REPLYlink written 5.0 years ago by Devon Ryan90k

Are you specifically interested in UCSC annotation? 

ADD REPLYlink written 5.0 years ago by PoGibas4.8k

Yes, I would like to check my list of lncRNAs against all public annotations. And I saw examples where lncRNAs differ in exon models in RefSeq, UCSC and GENCODE annotations, or are missing from one and present in other.

That is why I would like to get the UCSC lncRNA annotation. But I almost gave up, and am thinking about just using the Cabili list instead, although it has quite some artifacts.

ADD REPLYlink written 5.0 years ago by sasa_k0
0
gravatar for Shicheng Guo
2.6 years ago by
Shicheng Guo7.5k
Shicheng Guo7.5k wrote:

Go to https://www.gencodegenes.org/releases/19.html

wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.long_noncoding_RNAs.gtf.gz
tar xzvf gencode.v19.long_noncoding_RNAs.gtf.gz
awk 'NR>5 {print $1,$4,$5,$10}' gencode.v19.long_noncoding_RNAs.gtf > lncRNA.hg19.bed
perl -p -i -e "s/[\";]//g" lncRNA.hg19.bed
ADD COMMENTlink written 2.6 years ago by Shicheng Guo7.5k

I hope OP either found the data or stopped searching by now ;-)

ADD REPLYlink written 2.6 years ago by WouterDeCoster39k

;-), 2.5 years ago. I really hope he found it. or else, what a awful day!

ADD REPLYlink written 2.6 years ago by Shicheng Guo7.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1496 users visited in the last hour