Question: Detecting Polymorphic Sites In Multiple Sequence Aligned Files
3
gravatar for ngsgene
7.4 years ago by
ngsgene350
United States
ngsgene350 wrote:

I have used ClustalW2 to get a multiple sequence aligned file (15 sequences for protein coding region). I can see the alleles that are different at specific positions for example my clustalw2 alignment is such.

seq2            AAAAAT 6
seq3            AAAAAT 6
seq1            AAAAAA 6
seq4            TTAAAA 6
                ::***:

How can I make a table of the different alleles that exist at one position. For example at position 3,4,5 I have all A's but at position 6 I have two A's and two T's

Is there a tool to make a table like this:

pos  1 6 
seq2 A T
seq3 A T
seq1 A A
seq4 T A

Thanks!

Edited: I was looking for a tool to make the job easier (and be assured of results for a large dataset) guess there is nothing out there except writing your own script.

clustalw multiple • 4.0k views
ADD COMMENTlink modified 7.4 years ago by raunakms1.1k • written 7.4 years ago by ngsgene350
11
gravatar for raunakms
7.4 years ago by
raunakms1.1k
Vancouver, BC, Canada
raunakms1.1k wrote:

Here is a short script which uses the Bioperl module Bio::AlignIO. This script will read your alignment file (alignment_file.aln) and parse each column of the alignment at a time and then extracts the aligned nucleotide sequence from every alignment creating a tab separated table. The table is printed in an output file (table_ouput.txt).

The table will look like this (note: the number in the table indicates the column number):

1    A    A    A    T
2    A    A    A    T
3    A    A    A    A
4    A    A    A    A
5    A    A    A    A
6    T    T    A    A

Then you can write another script to parse the table and do any type of calculation you want to do !!!

Using Sequence Logo can also give you a rough idea of your over all sequence. Here is the sequence logo for your alignment above:

alt text

Here is the script to parse the alignment:

#!usr/bin/perl/ -w
use strict;
use warnings;

use Bio::AlignIO;
use Bio::LocatableSeq;

my $align_file = 'alignment_file.aln';
my $out_file = 'table_ouput.txt';

my $str = Bio::AlignIO->new('-file' => $align_file);
my $aln = $str->next_aln();

my $seq1 = $aln->get_seq_by_pos(1);
my $seq2 = $aln->get_seq_by_pos(2);
my $seq3 = $aln->get_seq_by_pos(3);
my $seq4 = $aln->get_seq_by_pos(4);

open(OUTPUT, ">$out_file") or die "error";

for (my $col = 1; $col<=$aln->length; $col++) 
{
    my $char_seq1 = $seq1->subseq($col, $col);
    my $char_seq2 = $seq2->subseq($col, $col);
    my $char_seq3 = $seq3->subseq($col, $col);
    my $char_seq4 = $seq4->subseq($col, $col);

    print OUTPUT $column, "\t";
    print OUTPUT $char_seq1, "\t";
    print OUTPUT $char_seq2, "\t";
    print OUTPUT $char_seq3, "\t";
    print OUTPUT $char_seq4, "\n";

}

close OUTPUT;
exit;
ADD COMMENTlink modified 5.5 years ago • written 7.4 years ago by raunakms1.1k

Thanks Raunak, I can always transpose the output file to get the results as intended (rows to columns) using Unix. I looked into sequence logo as Niek de Klein mentioned, and I liked the visualization.

ADD REPLYlink written 7.4 years ago by ngsgene350
3
gravatar for Niek De Klein
7.4 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

I don't know if you want to use the table for further processing, or you only want to use it for visualisation. In the latter case you'll probably want to make a sequence logo. This webserver makes a logo out of a multiple sequence alignment (see the examples if you don't know what a sequence logo is). It takes fasta, clustalw and flat format (see this for explanation of input).

ADD COMMENTlink written 7.4 years ago by Niek De Klein2.5k

From this table I have to detect which site was derived (from the ancestor) and plot them to see freqeuency.. though its interesting to see what a sequence logo is. Thanks for your response.

ADD REPLYlink written 7.4 years ago by ngsgene350
0
gravatar for User 2005
7.4 years ago by
User 200570
User 200570 wrote:

I would suggest MySQL if your table is less than 100MB. Contact in PM if you need more info :)

ADD COMMENTlink written 7.4 years ago by User 200570

I fail to see why I would need a table in MySQL, I am referring to multiple sequence alignments to get polymorphic sites, a table for reference not an RDBMS table

ADD REPLYlink written 7.4 years ago by ngsgene350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 505 users visited in the last hour