Bulk RNA-seq with genome but without gene annotation
Entering edit mode
7 weeks ago
Diego ▴ 10

Hi all,

I need to analyse RNA-seq data from two species that have a genome available. The problem is that the genomes do not have any GTF/GFF file so there is no information about genes. I was wondering how to analyse the data.

Intuitively, I would use Trinity with the genome-guided transcriptome and then use Kallisto. However, I still have a problem with the gene id. Which tool would you use to annotate the genome? Trinotate? Is there a preferred way?

Thanks in advance,


GFT wouthou analysisz RNA-seq • 312 views
Entering edit mode

Choice of gene annotation software is going to depend on the organisms you are studying. Eukaryotes? Prokaryotes?

Entering edit mode
7 weeks ago

I don't think genome annotation is easy at all, in fact I think it is an extremely hard problem.

Another option is

  • gmap with gff3 output (provided you have a transcript set to map). Else create one with Trinity.
  • visualize transcripts on genome to check accuracy
  • map reads to genome, eg STAR, Hisat2
  • featureCounts
  • DESeq2 etc

Maker is also very good, but harder to setup/ use than gmap.

Entering edit mode
7 weeks ago
ahmad mousavi ▴ 710


This is not hard but might be a long process. You need following step to get to get at least a minimum information about your transcriptome, I have used following steps :

1- Trinity ( to get assembly file)

2- RSEM analysis for finding gene expression

3- Using edgeR or DESeq2 for finding DEGs.

4- Use Trinotate as a way for annotating your Trinity assembly with known databases ( you can try with only try with UniProt and Pfam not nr at this point). It used blastx or blastp against the SWiSS-PROT entries not all Tremble. When I analyzed the plants I just used Virdiplant entries not all proteins a further information for my Trinity dataset.

5- Retrive information of GO for your UniProt ID from uniprot.org and then try WEGO website for GO enrichment.

By the way maker is another option but I think is time consuming for you and not worth to run it for a single dataset.

Hope it works for you.


Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6