Grouping gene names programmatically
1
0
Entering edit mode
3.8 years ago

I have a GFF3 file which i have imported into R. I have created a column called "gene_name" where the name (or the annotation) is located. I would like to group these based on the type of gene/protein. For example, I have a lot of different types of transposases, which can all be grouped into "transposases" (i.e. create a new column with the value "transposases" for each respective row). Another example would be to group all virulence genes into a group called "virulence". Since I can have several thousands of different gene names makes it difficult to do manually in R. Therefore, I was wondering if there exist a tool or function that can do this automatically?

Example data (very simplified with only two categories, original data may have 100 + different groups and 1000+ genes):

gene_name                           gene_group
IS3 family transposase IS629        transposase
IS3 family transposase ISSen4       transposase
IS3 family transposase IS2          transposase
Aerobactin synthase                 virulence
Ferric aerobactin receptor          virulence

I appreciate any input!

R gene • 521 views
ADD COMMENT
0
Entering edit mode

The question is not entirely clear because it's not clear if the grouping information is already in the data or not. If not the question could be about how to get this information. If the grouping information is already in the table, then this is simply an R programming question. Check for example the group_by function of the dplyr package or the package data.table.

ADD REPLY
0
Entering edit mode
3.8 years ago

Hi,

I think that you're looking for gene ontology: http://geneontology.org/

In R this can be achieved by using biomartr package among others.

António

ADD COMMENT

Login before adding your answer.

Traffic: 2589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6