how to find gene ontology for genes using RNA-seq data
1
0
Entering edit mode
2.7 years ago
evelyn ▴ 220

Hi,

I have multiple sample RNA-seq data. I have the gene annotations gff file. I want to find gene ontology terms for the genes. Can you please suggest a way.

Thank you!

rna-seq • 1.2k views
2
Entering edit mode

It helps to do a google search to find prior threads for common requests such as this (add site:biostars.org after your keywords to limit your search to Biostars, in this case gene ontology analysis). You will find multiple results.

1
Entering edit mode
2.7 years ago
MaxF ▴ 120

It's unclear where you are in the processing pipeline.

First you need to align and get feature counts (ie: number of reads per gene) from your RNA-seq. Next you perform differential expression analysis using something like limma or DESeq2. Once you've done that, you can take the genes differentially expressed in your condition of interest and plug them into a number of web-based GO enrichment tools (I like pantherdb, but metascape and genecodis are also nice).

0
Entering edit mode

Thank you! I have a list of genes that I want to find gene ontology terms for. I looked at pantherdb, can I plug a list of genes and get results for all together or do I need to plug one gene at a time,

0
Entering edit mode

There's a box on the homepage where you enter IDs (usually 1 per line). Your best best is something like ENSEMBL or Entrez IDs, but it will accept gene symbols/names as well (but they might map to something you don't expect, so be careful).

Let's say your list is 100 genes. The site will then find GO terms that are over represented in your gene list. So, if (on average) 1 in 100 human genes relate to inflammation, but in your gene list there are 20 genes related to inflammation, you have a 10-fold enrichment over the background (it's actually more complicated, but this is the general concept).

0
Entering edit mode

Thank you, I have list of 1500 genes. These came from a different genome for the same species whose reference genome is included in PANTHER. So I changed the gene id's based on reference genome available. I uploaded the gene list here http://www.pantherdb.org and chose file type (ID list), organism and Functional classification viewed in gene list. Then I got a list of results. I do not see a GO term column in the result. Can you please help me interpret the result.

0
Entering edit mode

You want the "Statistical overrepresentation test".

The option you picked breaks down the gene list to show you the GO terms that are represented in the list, but doesn't tell you which ones are enriched/overrepresented.

0
Entering edit mode

Thank you, I uploaded a plain text file with gene list. I selected reference list as available in the database. I chose GO biological process complete. It gave result and shows 250 genes are not mapped. But the number of genes from each category does not represent the actual total of genes in the list I uploaded. I also want to report gene ontology terms for individual genes. How can I do that?

0
Entering edit mode

You would not expect the total number of genes in each category to equal the total number of genes input. This is because genes can map to multiple GO terms (GO terms are highly redundant). You can explore all of the parent-child terms related to a GO term of interest at http://amigo.geneontology.org/amigo

A gene that is related to "defense response to virus" is also going to pop up in "immune system process" and "response to other biological organism" and many others.

If you want to see all the GO terms associated with a particular gene, you can enter it into "Search" box at the top left of the pantherdb website. Once you're on the gene page, there's a section ("Gene ontology database annotations") you can expand that shows you the (many) linked GO terms.

Here's the page for Interferon Beta as an example: http://www.pantherdb.org/genes/gene.do?acc=HUMAN%7CHGNC%3D5434%7CUniProtKB%3DP01574

0
Entering edit mode

Thank you, I can see results. I have a question: do I need to do it for each gene separately or I can plug a list of genes to get GO terms for each gene.

0
Entering edit mode

If you want to do this for a lot of genes your best bet is to learn how to use one of the R packages that interacts with a GO database.

I haven't used this one, but it looks promising.

If you don't know anything about R, it's not that difficult. Download R and R studio, then start out learning using the Swirl package. It's very user friendly.