GTF/GFF for non-coding RNA
5
2
Entering edit mode
5.5 years ago
pm2012 ▴ 100

Hello

Can anyone tell me where I can find a gtf/gff file for all the long non-coding RNAs in human preferably ENSEMBL annotations?

Thanks

GFF GTF Ensembl Annotation long non-coding • 7.0k views
ADD COMMENT
0
Entering edit mode

If you are looking fro Ensembl annotations, why not get them from Ensembl?

ADD REPLY
0
Entering edit mode

I didn't find one just with long non-coding RNA annotation there.

ADD REPLY
3
Entering edit mode

Download the full GTF and then filter by "gene_biotype". Each line contains it, so you can do it with a simple grep command.

Available gene biotypes: http://www.gencodegenes.org/gencode_biotypes.html

Non-coding info: http://useast.ensembl.org/info/genome/genebuild/ncrna.html

ADD REPLY
0
Entering edit mode

Hi, Igor,

I followed your instructions and downloaded the hg19_gtf, but there is not "gene_biotype" column in the gtf (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing)

ADD REPLY
0
Entering edit mode

The GTF you have is not from GENCODE or Ensembl. GTF files from other sources may not have a gene_biotype field.

ADD REPLY
0
Entering edit mode

Would be valid to use the gene_biotype specific gtf file for quantification after reads have been aligned to the reference genome?

ADD REPLY
0
Entering edit mode

You probably want to use the full GTF. You should probably be using the full transcriptome for normalization anyway. I wouldn't throw away useful information unless you are removing just some problematic biotypes.

ADD REPLY
1
Entering edit mode
5.5 years ago

You could try this one: http://www.lncipedia.org/download (haven't used myself but will in a few weeks).

ADD COMMENT
1
Entering edit mode
5.5 years ago
igor 12k

Some options:

ADD COMMENT
1
Entering edit mode
5.5 years ago

The GTF of all Ensembl genes, including all coding and non-coding biotypes, is on the FTP site.

ADD COMMENT
1
Entering edit mode
5.5 years ago
Denise CS ★ 5.2k

The good thing about going to link to GENCODE provided by @igor is that you can get a separate GTF containing long non coding RNA only. This GTF is a subset of the file in the link provided by @Emily_Ensembl. If you go with the latter, you need to retrieve the long non coding RNA from the rest (whether protein coding or not). If that's your choice, focus on the biotype 'lincRNA'. Check Annotation of ncRNAs for more details on those biotypes. You can check the number of lincRNAs found in the current human assembly and gene annotation versions from Ensembl.

ADD COMMENT

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6