Question: Bioperl - Parsing a tblastn report for gene presence/absence using bioperl
gravatar for matthew.howes
3.9 years ago by
European Union
matthew.howes0 wrote:

Hi, any advice is appreciated!

I am trying to write a Perl program which uses Bioperl to parse a tblastn report of gene protein query sequences against a nucleotide genome sequence. I am using a sequence length of 80% and a fraction identical measure of 80% relative to the query as cut-offs. The program reports '1' if present, '0' if absent next to gene name in the output. Also I used tiling methods in the program as the nucleotide sequences are made up of draft contig sequences.

However, when i do a positive control and do a tblastn of the protein sequences against the same genome sequence that the proteins have originated from, many genes are reported as absent ("gene" 0 is the ouput). Any help as to this problem?

Here is the program:

use strict;
use Bio::SearchIO;
use Bio::Search::Tiling::MapTiling;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $ARGV[0]);

while( my $result = $in->next_result ) {
    ## $result is a Bio::Search::Result::ResultI compliant object
    while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    my $tiling =
    if( $tiling->frac_identical('query') >= 0.8 && (100 / $result->query_length) * $tiling->length('query') >= 80) {
        print $result->query_name, " 1", "\n"; }
    else {
        print $result->query_name, " 0", "\n";}

Sample output of positive control run (all genes should have a 1 next to them in the positive control):

yaaF 0
UTI89_C0033 0
dapB 1
UTI89_C0035 1
carA 1
carB 1
UTI89_C0038 1
UTI89_C0038 0
UTI89_C0039 0


blast tblastn bioperl gene • 1.3k views
ADD COMMENTlink modified 3.9 years ago by RamRS20k • written 3.9 years ago by matthew.howes0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 759 users visited in the last hour