Question: Is there a succinct method of isolating sequences phylogenetically similar to a distinct reference clade.
gravatar for Daniel
5.6 years ago by
Cardiff University
Daniel3.7k wrote:

This is a bit outside of my wheelhouse so I apologize if some terminology is incorrect.

I have a number of reference RBCL gene sequences for a taxonomic family, and I want to extract sequences from a NGS dataset (454) which are phylogenetically similar for downstream analysis. I have done this with blast previously but I'm sure that there are more accurate methods considering non-functional mutations/sequencing errors.

What I consider my current best option is to take all the NGS reads and create a massive alignment with the reference sequences, visually identify the relevant clades from a tree and pull out the relevant sequences. However, this is going to be computationally intensive and time consuming to do if we want to repeat for different taxa or NGS sets in the future.

Does anyone know of a tool/package that works on these principles? I am imagining something exists which operates similarly to blast i.e. $tool -i NGS.fasta -r reference_seqs.fasta ....... 


blast alignment amplicon • 1.4k views
ADD COMMENTlink modified 5.5 years ago • written 5.6 years ago by Daniel3.7k
gravatar for Daniel
5.5 years ago by
Cardiff University
Daniel3.7k wrote:

I've found the best way to do this (that I'm aware of at the moment) is to perform closed reference OTU picking using USEARCH/UPARSE with a reference fasta set for the taxa that I'm interested in.

The total NGS sequences are aligned using the UPARSE maximum parsimony based algorithm to the 'closed reference'  of my all_reference.fasta, thenI parse out those which are on my interest list.

More info at


usearch -uparse_ref input.fas -db global_ref.fas -strand plus  -fastaout ref_matched.fasta
fasta_pull ref_matched.fasta ref_taxa.lst

ADD COMMENTlink written 5.5 years ago by Daniel3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour