Which annotation database is better for analyzing RNA-Seq data from humans including lncRNA ?
1
0
Entering edit mode
7.8 years ago
tunl ▴ 80

We are trying to analyze RNA-Seq data from humans (hg19 or GRCh37) to analyze lncRNA and other forms of RNA.

Among UCSC, GENCODE, NONCODE, and Ensembl, which annotation database is better for this purpose?

Any suggestions would be really appreciated.

Some comparisons among them would be even more helpful.

Thank you very much!

RNA-Seq lncRNA Annotation database • 2.0k views
ADD COMMENT
1
Entering edit mode

GENCODE=Ensembl. So from 4 options, you are down to 3 now :). The advantage of the GENCODE = Ensembl gene set is that it includes the manual annotation of noncoding genes by the HAVANA team (Vertebrate Annotation). Check their annotation guidelines, specially from page 23-27.

ADD REPLY
1
Entering edit mode

Thank you so much for your information!

It helps a lot.

ADD REPLY
1
Entering edit mode
7.8 years ago

GENCODE has more lncRNAs than UCSC.

NONCODE has more lncRNAs than GENCODE, but I've found that the majority of them are not expressed. So, I would recommend using GENCODE for most human and mouse RNA-Seq experiments.

If you have a specific application, there might be disease-/tissue-specific lncRNAs. For example, MiTranscriptome has lncRNAs assembled from cancer RNA-Seq experiments: http://www.mitranscriptome.org/

You could also try using cufflinks to assemble transcripts if you want to try and discover novel lncRNAs in a species with a genome sequence but hasn't been as well-characterized as the mouse or human transcriptomes.

ADD COMMENT
1
Entering edit mode

Thank you very much for your advice!

This really helps!

I’ll go with GENCODE then.

ADD REPLY

Login before adding your answer.

Traffic: 2720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6