Question: Counting Gaps In A Column Of Alignment
gravatar for sksweety24
7.3 years ago by
sksweety2410 wrote:

Hi ,

I am a new bioperl user. I made a script to count the nucleotide composition in each column of the alignment. I am able to get the count for the nucleotides, but slice doesn't allow the count of gaps as it seems to exclude them. I would greatly appreciate any suggestions to solve this.

#!/bin/perl -w
use strict;
use warnings;

use List::Util 'max';
use Bio::SimpleAlign; 
use Bio::Align::AlignI;
use Bio::AlignIO;
use Bio::SeqIO;

my $in= Bio::AlignIO->new( -file => "seq.fst", -format => "fasta");

my $align = $in->next_aln();

print "column\tA's\tT'st\C's\tG's\n";

for (my $i = 1; $i <= $align->length; $i++) {
    my %count;
    my $seqs = $align->slice($i,$i);
    my $gap_char = $seqs->gap_char();       
    my $count_A=0;
    my $count_C=0;
    my $count_T=0;
    my $count_G=0;
    my $count_N=0;
    my $count_gap=0;
    foreach my $seq ($seqs->each_seq) {
        my $col=$seq->seq;
        if ($col eq 'A') {
        } elsif ($col eq 'C'){
        } elsif ($col eq 'T'){
        } elsif ($col eq 'G'){
        } elsif ( $col eq 'N'){
        } elsif ($col =~ m/^\Q$gap_char\E$/){
        $count{$seq->seq} += 1;

I get a warning like this: MSG: Slice [5-5] of sequence [3/1-6] contains no residues. Sequence excluded from the new alignment". and doesn't seem to count the gaps in the column.

alignment • 2.6k views
ADD COMMENTlink modified 7.2 years ago by SimonCB765150 • written 7.3 years ago by sksweety2410

Do you mean that the gap counting itself is not working or the result is not being printed out? You don't appear to be printing out the gap count anywhere in your code, so if it's not being printed out I would imagine that's the problem.

ADD REPLYlink written 7.3 years ago by SimonCB765150
gravatar for SimonCB765
7.2 years ago by
SimonCB765150 wrote:

My Perl/BioPerl is a bit rusty, but I'll take a stab at answering this. The warning seems pretty straightforward. If you check the docs for slice it says that "Sequences with no residues in the slice are excluded from the new alignment and a warning is printed", which I imagine explains the warnings that you are getting. Assuming that slice works in the way that I'm understanding it to, then every sequence with a gap in it would be excluded from your alignment slice, and hence you would get 0 gaps.

ADD COMMENTlink written 7.2 years ago by SimonCB765150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 846 users visited in the last hour