Tools for pairwise sequence alignment
0
0
Entering edit mode
3 months ago
pooryamb • 0

Dear All,

I need to align two DNA sequences, and I need to know the mapping between the query and target positions. For example, the 100th nucleotide in the query might align with the 150th nucleotide of the target. I want to know this mapping for all nucleotides. Is there any tool for this?

pairwise_sequence_alignment • 420 views
0
Entering edit mode

You can use one of the EMBOSS pair-wise alignment tools. They all have web interfaces here. You probably want the local alignment tools if your sequences are not of equal length or similar.

0
Entering edit mode

Actually, I need to align many gene pairs, so I need a programmatic interface. Besides, I need a program that gives the mapped positions so that for a specific position in the query, I can easily get the aligned position of nucleotide in the hit.

1
Entering edit mode

How about using minimap2, and output in PAF format with cigar tags? For example, using the SARS-CoV-2 genome with some random nucleotides I pulled out...

minimap2 -x map-ont -t 4 -c subject.fa query.fa > alignment.paf
cat alignment.paf


Output:

NC_045512.2 262 0   262 +   NC_045512.2 29903   350 630 262 280 60  NM:i:18 ms:i:512    AS:i:484    nn:i:0  tp:A:P  cm:i:40 s1:i:241    s2:i:0  de:f:0.0038 rl:i:0  cg:Z:92M18D170M

• Alignment beginning (column 8): position 350, but 0-based so actually == position 351 of the subject
• Alignment end (column 9): position 630 of the subject
• Matching cigar string (last column): 92M18D170M == first 92 nt match, 18 nt deletion, 170 nt match

https://github.com/lh3/miniasm/blob/master/PAF.md

You'd need to programmatically use the start and end positions in conjunction with the cigar string to figure out the alignment position of the nth nucleotide.

0
Entering edit mode

Thanks, Very good suggestion.

0
Entering edit mode

I need to align many gene pairs

You left out that vital piece of information in original post. In any case EMBOSS tools can also be run from the command line.