Bioperl - Parsing a tblastn report for gene presence/absence using bioperl
0
0
Entering edit mode
9.1 years ago

Hi, any advice is appreciated!

I am trying to write a Perl program which uses Bioperl to parse a tblastn report of gene protein query sequences against a nucleotide genome sequence. I am using a sequence length of 80% and a fraction identical measure of 80% relative to the query as cut-offs. The program reports '1' if present, '0' if absent next to gene name in the output. Also I used tiling methods in the program as the nucleotide sequences are made up of draft contig sequences.

However, when I do a positive control and do a tblastn of the protein sequences against the same genome sequence that the proteins have originated from, many genes are reported as absent ("gene" 0 is the ouput). Any help as to this problem?

Here is the program:

#!/usr/bin/perl
use strict;
use Bio::SearchIO;
use Bio::Search::Tiling::MapTiling;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $ARGV[0]);

while( my $result = $in->next_result ) {
    ## $result is a Bio::Search::Result::ResultI compliant object
    while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    my $tiling =
        Bio::Search::Tiling::MapTiling->new($hit);
    if( $tiling->frac_identical('query') >= 0.8 && (100 / $result->query_length) * $tiling->length('query') >= 80) {
        print $result->query_name, " 1", "\n"; }
    else {
        print $result->query_name, " 0", "\n";}
    }
}

Sample output of positive control run (all genes should have a 1 next to them in the positive control):

yaaF 0
UTI89_C0033 0
dapB 1
UTI89_C0035 1
carA 1
carB 1
UTI89_C0038 1
UTI89_C0038 0
UTI89_C0039 0
gene bioperl blast tblastn • 2.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1353 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6