I make pairwise alignment with
./water form the emboss package.
I want all fasta identifier of the sequences which align with a similarity of > 60%. I can find no option to set a threshold for the identity, similarity or the alignment score. I can parse the output, but maybe you have any tips here, or I have overlooked something? Additional, the output does not contain the whole .fasta names of the match, only a part of it like:
# Aligned_sequences: 2 # 1: 344TS28_contig130193 # 2: 1648TS28_contig03869 # Matrix: EDNAFULL # Gap_penalty: 5.0 # Extend_penalty: 0.5 # # Length: 732 # Identity: 287/732 (39.2%) # Similarity: 287/732 (39.2%) # Gaps: 424/732 (57.9%) # Score: 765.5
fasta identifier in sequence:
Is this normal, or a bug (maybe it will be correct, when I replace the
|)? Thank you in advance!
Edit: If I increase the costs for gap, the output will contain only sequences with a higher similarity, but I can't detect any fixed score-threshold.
Edit2: The problems with the fasta identifiers will go away when I replace all
sed -i 's/|/_/g' seq.fasta)