Question: how to find gene ontology for genes using RNA-seq data
0
gravatar for evelyn
4 weeks ago by
evelyn100
evelyn100 wrote:

Hi,

I have multiple sample RNA-seq data. I have the gene annotations gff file. I want to find gene ontology terms for the genes. Can you please suggest a way.

Thank you!

rna-seq • 155 views
ADD COMMENTlink modified 4 weeks ago by MaxF70 • written 4 weeks ago by evelyn100
2

It helps to do a google search to find prior threads for common requests such as this (add site:biostars.org after your keywords to limit your search to Biostars, in this case gene ontology analysis). You will find multiple results.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax87k
1
gravatar for MaxF
4 weeks ago by
MaxF70
MaxF70 wrote:

It's unclear where you are in the processing pipeline.

First you need to align and get feature counts (ie: number of reads per gene) from your RNA-seq. Next you perform differential expression analysis using something like limma or DESeq2. Once you've done that, you can take the genes differentially expressed in your condition of interest and plug them into a number of web-based GO enrichment tools (I like pantherdb, but metascape and genecodis are also nice).

ADD COMMENTlink written 4 weeks ago by MaxF70

Thank you! I have a list of genes that I want to find gene ontology terms for. I looked at pantherdb, can I plug a list of genes and get results for all together or do I need to plug one gene at a time,

ADD REPLYlink written 4 weeks ago by evelyn100

There's a box on the homepage where you enter IDs (usually 1 per line). Your best best is something like ENSEMBL or Entrez IDs, but it will accept gene symbols/names as well (but they might map to something you don't expect, so be careful).

Let's say your list is 100 genes. The site will then find GO terms that are over represented in your gene list. So, if (on average) 1 in 100 human genes relate to inflammation, but in your gene list there are 20 genes related to inflammation, you have a 10-fold enrichment over the background (it's actually more complicated, but this is the general concept).

ADD REPLYlink written 4 weeks ago by MaxF70

Thank you, I have list of 1500 genes. These came from a different genome for the same species whose reference genome is included in PANTHER. So I changed the gene id's based on reference genome available. I uploaded the gene list here http://www.pantherdb.org and chose file type (ID list), organism and Functional classification viewed in gene list. Then I got a list of results. I do not see a GO term column in the result. Can you please help me interpret the result.

ADD REPLYlink written 4 weeks ago by evelyn100

You want the "Statistical overrepresentation test".

The option you picked breaks down the gene list to show you the GO terms that are represented in the list, but doesn't tell you which ones are enriched/overrepresented.

ADD REPLYlink written 4 weeks ago by MaxF70

Thank you, I uploaded a plain text file with gene list. I selected reference list as available in the database. I chose GO biological process complete. It gave result and shows 250 genes are not mapped. But the number of genes from each category does not represent the actual total of genes in the list I uploaded. I also want to report gene ontology terms for individual genes. How can I do that?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by evelyn100

You would not expect the total number of genes in each category to equal the total number of genes input. This is because genes can map to multiple GO terms (GO terms are highly redundant). You can explore all of the parent-child terms related to a GO term of interest at http://amigo.geneontology.org/amigo

A gene that is related to "defense response to virus" is also going to pop up in "immune system process" and "response to other biological organism" and many others.

If you want to see all the GO terms associated with a particular gene, you can enter it into "Search" box at the top left of the pantherdb website. Once you're on the gene page, there's a section ("Gene ontology database annotations") you can expand that shows you the (many) linked GO terms.

Here's the page for Interferon Beta as an example: http://www.pantherdb.org/genes/gene.do?acc=HUMAN%7CHGNC%3D5434%7CUniProtKB%3DP01574

ADD REPLYlink written 4 weeks ago by MaxF70

Thank you, I can see results. I have a question: do I need to do it for each gene separately or I can plug a list of genes to get GO terms for each gene.

ADD REPLYlink written 4 weeks ago by evelyn100

If you want to do this for a lot of genes your best bet is to learn how to use one of the R packages that interacts with a GO database.

I haven't used this one, but it looks promising.

If you don't know anything about R, it's not that difficult. Download R and R studio, then start out learning using the Swirl package. It's very user friendly.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by MaxF70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1560 users visited in the last hour