Question: How can I extract position information from multiple alignment format?
0
gravatar for haewon
4 days ago by
haewon0
haewon0 wrote:

Hi

I have generated simple multiple alignment files using MAFFT.

mafft --clustalout mafft_input.txt > mafft_output.txt

And below is the output file

Screen-Shot-2020-02-13-at-12-01-12-PM

Is there anyway I can get position information of matching or deleted region?

Thanks in advance.

ADD COMMENTlink modified 4 days ago • written 4 days ago by haewon0

I don't think MAFFT works well for pairwise alignment. I think you can try local or global alignment tools or blast. And if you have a set of reads and a reference you can try bowtie2.

But if you want to parse your current output hopefully there's a tool oit there but if you want to write your own script a messy approach is that you can concatenate the second columns of lines that start with reference and name it str1 and do the same for lines that start with read and name it str2 and then compare these two strings. At each position check if the character is - or atcg,...

ADD REPLYlink modified 4 days ago • written 4 days ago by Fatima340

Thanks for your suggestion. I tried bowtie2 and bwa but both of them couldn't identify the gap even with 0 gap opening and gap penalty - they instead clipped the sequence. And then I also tried blastn but it gave 2 results matching only 5' or 3'.

ADD REPLYlink written 4 days ago by haewon0

You could look at biopython: Parsing A Clustal Alignment File Do you need position information as in NNN - NNN - Match, XXX-XXXX - Deletion in numeric range?

ADD REPLYlink written 4 days ago by genomax78k

Yes, that's exactly what I'm looking for. I'll take a look for biopython. Thanks for the suggestion.

ADD REPLYlink written 4 days ago by haewon0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour