Question: Need Suggestions For A Greedy Algorithm For Thoroughly Assembling Very Short Reads
1
gravatar for JacobS
5.6 years ago by
JacobS890
Cleveland, Ohio
JacobS890 wrote:

I am looking for a painless method for conducting a very small assembly of short sequences based on exact identity. Simply put, I have an NGS sample that I believe is contaminated with a common sequence. I scanned a few million reads and determined the top 50 most abundant kmers of length 25nt. Browsing these top 50 kmers, it is clear that they are mostly staggered windows of a single sequence, and I would like to assemble these 50 kmers by overlapping identity.

Short of writing a perl script, does someone know of a simple way to do this? Thanks!

assembly • 1.2k views
ADD COMMENTlink modified 5.6 years ago by Torst900 • written 5.6 years ago by JacobS890
2
gravatar for Torst
5.6 years ago by
Torst900
Australia
Torst900 wrote:

So you have 50 sequences, each of 25bp length, and you believe them to be highly overlapping with virtually 100% identity representing a parent sequence of about 75bp or so?

The simplest thing to is to a multiple sequence alignment (MSA) of the 50 sequences. The consensus sequence will be your contaminant sequence. This is a "poor man's" de novo assembly but fits your situation well.

To do the MSA you can use clustal-omega:

clustalo -i kmers.fasta > kmers.aln

To get the consensus, you can use 'cons' from EMBOSS:

cons -plurality 0 -sequence kmers.aln -outseq contaminant.fasta
ADD COMMENTlink written 5.6 years ago by Torst900

Hi @Torst, thanks for your descriptive explanation! While it certainly solves the problem, I should explain that I am more interested in finding a simple assembler for solving this problem. I would actually like to use such an assembler on the top 500 kmers, which will likely constitute 10 reference seqs, which would hopefully assemble into 10 different kmers. Furthermore, the reads may be from different strands, and I could have top kmers that are inverse-complements of the other kmers, so I would want to assemble while considering every possible orientation. Am I wrong in assuming it would be tedious to complete such a task using clustal-omega?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by JacobS890
1

CAP3 would do a good job, but it will need a few parameters tweaked for your situation:

http://seq.cs.iastate.edu/

ADD REPLYlink written 5.6 years ago by Torst900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour