Is there a succinct method of isolating sequences phylogenetically similar to a distinct reference clade.
Entering edit mode
9.4 years ago
Daniel ★ 3.9k

This is a bit outside of my wheelhouse so I apologize if some terminology is incorrect.

I have a number of reference RBCL gene sequences for a taxonomic family, and I want to extract sequences from a NGS dataset (454) which are phylogenetically similar for downstream analysis. I have done this with blast previously but I'm sure that there are more accurate methods considering non-functional mutations/sequencing errors.

What I consider my current best option is to take all the NGS reads and create a massive alignment with the reference sequences, visually identify the relevant clades from a tree and pull out the relevant sequences. However, this is going to be computationally intensive and time consuming to do if we want to repeat for different taxa or NGS sets in the future.

Does anyone know of a tool/package that works on these principles? I am imagining something exists which operates similarly to blast i.e. $tool -i NGS.fasta -r reference_seqs.fasta .......


alignment amplicon blast • 2.1k views
Entering edit mode
9.3 years ago
Daniel ★ 3.9k

I've found the best way to do this (that I'm aware of at the moment) is to perform closed reference OTU picking using USEARCH/UPARSE with a reference fasta set for the taxa that I'm interested in.

The total NGS sequences are aligned using the UPARSE maximum parsimony based algorithm to the 'closed reference' of my all_reference.fasta, thenI parse out those which are on my interest list.

More info at

usearch -uparse_ref input.fas -db global_ref.fas -strand plus -fastaout ref_matched.fasta fasta_pull ref_matched.fasta ref_taxa.lst

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6