Question

Why to use NAST (Nearest Alignment Space Termination)for aligning 16S rRNA sequences when there are already thousands of aligners available?

1

Entering edit mode

7.9 years ago

lakhujanivijay 5.8k

Why to use NAST (Nearest Alignment Space Termination)for aligning 16S rRNA sequences when there are already thousands of aligners available? What advantage does NAST serves?

NAST qiime alignment • 3.1k views

ADD COMMENT • link updated 7.9 years ago by natasha.sernova ★ 4.0k • written 7.9 years ago by lakhujanivijay 5.8k

score 2 · Answer 1 · 2016-05-12

Look at this papers. The introduction to PyNast-paper will tell you a lot. Nast itself is not very popular now.

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

http://nar.oxfordjournals.org/content/34/suppl_2/W394.full

PyNAST: a flexible tool for aligning sequences to a template alignment

"Results: The availability of PyNAST will make the popular NAST algorithm more portable and thereby applicable to datasets orders of magnitude larger by allowing users to install PyNAST on their own hardware. Additionally because users can align to arbitrary template alignments, a feature not available via the original NAST web interface, the NAST algorithm will be readily applicable to novel tasks outside of microbial community analysis."

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2804299/

PyNAST: Python Nearest Alignment Space Termination tool

http://biocore.github.io/pynast/

PyNast is cited by 441 articles:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2804299/citedby/

score 0 · Answer 2 · 2016-05-12

Hi Natasha

I think the answer lies within this text (pasted below) but I am unable to comprehend the same. Can you please shed some light?

Candidate sequences are not permitted to introduce new gap characters into the template database, so the algorithm introduces local mis-alignments to preserve the existing template sequence.

Source: http://qiime.org/scripts/align_seqs.html

Thanks

score 0 · Answer 3 · 2016-05-12

As far as I have understood, there is an alternative with PyNast You will get as a result either a full length alignment with some "mis-alignments" - gaps are not allowed, or you will receive a shorter alignment with a particular length (the length you give) with some amount of mis-aligmnents (their % you propose), and the program should choose among all possible alignments the alignment with this parameters - I am not sure there will be the only one result, if the sequence is quite long. It depends also on "the rounding accuracy" of percentage parameter. Probably the default is to show the first found sequence-fragment that is OK in terms of these parameters, but I may be wrong. The authors say the following:"The set of matching template sequences will be searched for a match that meets these requirements, with preference given to the sequence length. By default, the minimum sequence length is 150 and the minimum percent id is 75%. The minimum sequence length is much too long for typical pyrosequencing reads, but was chosen for compatibility with the original NAST tool."

Alignment with PyNAST:

The default alignment method is PyNAST, a python implementation of the NAST alignment algorithm. The NAST algorithm aligns each provided sequence (the “candidate” sequence) to the best-matching sequence in a pre-aligned database of sequences (the “template” sequence). Candidate sequences are not permitted to introduce new gap characters into the template database, so the algorithm introduces local mis-alignments to preserve the existing template sequence. The quality thresholds are the minimum requirements for matching between a candidate sequence and a template sequence. The set of matching template sequences will be searched for a match that meets these requirements, with preference given to the sequence length. By default, the minimum sequence length is 150 and the minimum percent id is 75%. The minimum sequence length is much too long for typical pyrosequencing reads, but was chosen for compatibility with the original NAST tool.

The following command can be used for aligning sequences using the PyNAST method, where we supply the program with a FASTA file of unaligned sequences (i.e. resulting FASTA file from pick_rep_set.py, a FASTA file of pre-aligned sequences (this is the template file, which is typically the Greengenes core set - available from http://greengenes.lbl.gov/), and the results will be written to the directory “pynast_aligned/”:

align_seqs.py -i $PWD/unaligned.fna -t $PWD/core_set_aligned.fasta.imputed -o $PWD/pynast_aligned_defaults/

Alternatively, one could change the minimum sequence length (“-e”) requirement and minimum sequence identity (“-p”), using the following command:

align_seqs.py -i $PWD/unaligned.fna -t core_set_aligned.fasta.imputed -o $PWD/pynast_aligned/ -e 500 -p 95.0