7 weeks ago
I need to analyse RNA-seq data from two species that have a genome available. The problem is that the genomes do not have any GTF/GFF file so there is no information about genes. I was wondering how to analyse the data.

Intuitively, I would use Trinity with the genome-guided transcriptome and then use Kallisto. However, I still have a problem with the gene id. Which tool would you use to annotate the genome? Trinotate? Is there a preferred way?

Choice of gene annotation software is going to depend on the organisms you are studying. Eukaryotes? Prokaryotes?

7 weeks ago

I don't think genome annotation is easy at all, in fact I think it is an extremely hard problem.

Another option is

  • gmap with gff3 output (provided you have a transcript set to map). Else create one with Trinity.
  • visualize transcripts on genome to check accuracy
  • map reads to genome, eg STAR, Hisat2
  • featureCounts
  • DESeq2 etc

Maker is also very good, but harder to setup/ use than gmap.

7 weeks ago
ahmad mousavi ▴ 710


This is not hard but might be a long process. You need following step to get to get at least a minimum information about your transcriptome, I have used following steps :

1- Trinity ( to get assembly file)

2- RSEM analysis for finding gene expression

3- Using edgeR or DESeq2 for finding DEGs.

4- Use Trinotate as a way for annotating your Trinity assembly with known databases ( you can try with only try with UniProt and Pfam not nr at this point). It used blastx or blastp against the SWiSS-PROT entries not all Tremble. When I analyzed the plants I just used Virdiplant entries not all proteins a further information for my Trinity dataset.

5- Retrive information of GO for your UniProt ID from uniprot.org and then try WEGO website for GO enrichment.

By the way maker is another option but I think is time consuming for you and not worth to run it for a single dataset.

