Question: How to correctly format WEGO's input with Perl
gravatar for Santiago Montero-Mendieta
4.0 years ago by

Hello! ~ I did a de-novo transcriptome assembly using trinity and then annotated transcripts using trinotate. I now want to take a look at the distribution of gene ontology (GO) categories among transcripts. I wanna use WEGO for this (although software suggestions are welcome!). I'm working / struggling with Perl to correctly format my input file for WEGO.

I have a three-column tab delimited file like this:

TRINITY_DN150378_c0_g6_i1   GO:0005905^cellular_component^coated pit`GO:0016021^cellular_component^integral component of membrane`GO:0005509^molecular_function^calcium ion binding`GO:0006897^biological_process^endocytosis   GO:0005515^molecular_function^protein binding`GO:0005509^molecular_function^calcium ion binding
TRINITY_DN151939_c1_g1_i4   GO:0016021^cellular_component^integral component of membrane`GO:0016020^cellular_component^membrane`GO:0005634^cellular_component^nucleus`GO:0001558^biological_process^regulation of cell growth`GO:0030856^biological_process^regulation of epithelial cell differentiation   .

And I want my file to look like this:

TRINITY_DN150378_c0_g6_i1   GO:0005905 GO:0016021 GO:0005509 GO:0006897 GO:0005515 GO:0005509
TRINITY_DN151939_c1_g1_i4   GO:0016021 GO:0016020 GO:0005634 GO:0001558 GO:0030856

This is what I wrote so far:

#! /usr/bin/perl -w

use strict;

my @array;
my $input_file = $ARGV[0] or die ;  
open (IN, $input_file) or die "Couldn't open file $input_file: $!\n";

while (<IN>) {      
    my @column = split /\t+/, $_;
    my @GO1 = split /`/ , $column[1];
    foreach my $GO_name (@GO1) {
        my @GO_id = split /\^/ , $GO_name;
        push @array, "$GO_id[0]" if ($GO_id[0] !~ m/^\./);  
foreach my $line ( @array ) {
    print $line, "\n";

close IN;

This produces a list of GO categories like so:


But this just includes the first of the two GO's columns and does not match transcript ID.

I would really appreciate if someone can help to complete this script (or to write it in a different way).

Thanks in advance! ☺

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Santiago Montero-Mendieta130
gravatar for Israel Barrantes
4.0 years ago by
Israel Barrantes790 wrote:

I adapted the suggestions given here:

for a script I made to pass the ANNOT output of blast2go, to WEGO. Although be aware that the WEGO server has not updated its GO database since 2009, If I recall correctly.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Israel Barrantes790
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1136 users visited in the last hour