Question

Need Suggestions For A Greedy Algorithm For Thoroughly Assembling Very Short Reads

1

Entering edit mode

11.1 years ago

JacobS ▴ 990

I am looking for a painless method for conducting a very small assembly of short sequences based on exact identity. Simply put, I have an NGS sample that I believe is contaminated with a common sequence. I scanned a few million reads and determined the top 50 most abundant kmers of length 25nt. Browsing these top 50 kmers, it is clear that they are mostly staggered windows of a single sequence, and I would like to assemble these 50 kmers by overlapping identity.

Short of writing a perl script, does someone know of a simple way to do this? Thanks!

assembly • 2.1k views

ADD COMMENT • link updated 11.1 years ago by Torst ▴ 980 • written 11.1 years ago by JacobS ▴ 990

score 2 · Answer 1 · 2013-10-21

2

Entering edit mode

11.1 years ago

Torst ▴ 980

So you have 50 sequences, each of 25bp length, and you believe them to be highly overlapping with virtually 100% identity representing a parent sequence of about 75bp or so?

The simplest thing to is to a multiple sequence alignment (MSA) of the 50 sequences. The consensus sequence will be your contaminant sequence. This is a "poor man's" de novo assembly but fits your situation well.

To do the MSA you can use clustal-omega:

clustalo -i kmers.fasta > kmers.aln

To get the consensus, you can use 'cons' from EMBOSS:

cons -plurality 0 -sequence kmers.aln -outseq contaminant.fasta

ADD COMMENT • link 11.1 years ago by Torst ▴ 980

0

Entering edit mode

Hi @Torst, thanks for your descriptive explanation! While it certainly solves the problem, I should explain that I am more interested in finding a simple assembler for solving this problem. I would actually like to use such an assembler on the top 500 kmers, which will likely constitute 10 reference seqs, which would hopefully assemble into 10 different kmers. Furthermore, the reads may be from different strands, and I could have top kmers that are inverse-complements of the other kmers, so I would want to assemble while considering every possible orientation. Am I wrong in assuming it would be tedious to complete such a task using clustal-omega?

ADD REPLY • link 11.1 years ago by JacobS ▴ 990

1

Entering edit mode

CAP3 would do a good job, but it will need a few parameters tweaked for your situation:

http://seq.cs.iastate.edu/

ADD REPLY • link 11.1 years ago by Torst ▴ 980