Question: Clustering Genes in a Genome based on GO Terms
0
gravatar for anniejeanforster
2.7 years ago by
anniejeanforster0 wrote:

Hi there,

I'm an undergraduate who's just getting introduced to bioinformatics so please bear with me.

I'm trying to find correlations between the number and length of invariant regions in the Anopheles gambiae genome with their function to find any interesting links. I have a list of all the genes in An. gambiae with all their GO terms and I have a list of genes with invariant regions (IR) in (much smaller than the complete genome), information on IR lengths, sequence etc. and their GO Terms as well. I want to cluster the genes in both lists based on their GO terms and find which "clusters" have longer or a greater number of IRs along with making sure this is not just because a functional group is more represented within the genome (hence needing to look at the whole genome also.) Basically I'm just trying to make the stats sound.

I have never used R although I have downloaded it, I have tried DAVID but I don't really understand the output or how to use it in my analysis, I've tried to use GoSlim to get general GO terms per gene but can't seem to get it to work as my files are too big etc. I've got GiTools running a hierachical analysis but it's taking a long time and I'm not sure I set it up correctly either.

Is their any other way you can get a basic and broad description of a gene's function without using GO terms? The problem I'm mainly having is that all genes have multiple terms so I can't categorise them for analysis, hence clustering, hence getting very confused and frustrated!

Any help would be dearly appreciated. I'm doing an internship this Summer and as you may be able to tell I am out of my depth! Put as simply as possible would be much appreciated. Thank you very much.

ADD COMMENTlink modified 2.7 years ago by EagleEye6.2k • written 2.7 years ago by anniejeanforster0
0
gravatar for natasha.sernova
2.7 years ago by
natasha.sernova3.4k
natasha.sernova3.4k wrote:

See these posts, it's some kind of introduction to gene clustering:

What Is The Definition Of Gene Cluster

an introduction to GO annotations

Use and misuse of the gene ontology annotations

http://www.nature.com.sci-hub.cc/nrg/journal/v9/n7/full/nrg2363.html#

Some introduction to gene ontology:

https://en.wikipedia.org/wiki/Gene_ontology

Molecular Function Ontology Guidelines

http://geneontology.org/page/molecular-function-ontology-guidelines

Ontology Documentation

http://geneontology.org/page/ontology-documentation

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by natasha.sernova3.4k
0
gravatar for EagleEye
2.7 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

Gene annotation retrieval or clustering/enrichment tool:

Gene Set Clustering based on Functional annotation (GeneSCF)

Download: http://genescf.kandurilab.org/downloads.php


Direct answer to question:

Have you tried GeneSCF, if you have your own GO terms annotation please use GeneSCF v1.0 which accepts user defined annotation for clustering.

Follow this instructions to use your annotation as input,

A: GO enrichment analysis using a Text file with all the genes and GO ids associat


Alternate easy solution:

Or else if you want to make the cluster easy without much struggle use version 1.1. All you need is give your list of genes as input and GeneSCF will take care of annotation (uses updated information from geneontology). The detailed steps are here

A: Collecting Genes with similar GO term

Form the tutorial instead of '-org=mgi' organism use '-org=Pfalciparum' (Plasmodium falciparum)

Example: Assuming you got Entrez GeneIDs (gid) as input list, if you have Gene Symbols use '-t=sym' (Adjust your background number of genes accordingly).

Single step process,

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=Pfalciparum --plot=yes --background=15000

Two step process,

Step1:

./prepare_database -db=GO_all -org=Pfalciparum

Step2:

/geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=Pfalciparum --plot=yes --background=15000


Detailed documentation and Test dataset tutorials:

http://genescf.kandurilab.org/documentation.php

Here is the difference between V1.0 and V1.1

enter image description here


Good Luck with your analysis.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by EagleEye6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1955 users visited in the last hour