Need Suggestions For A Greedy Algorithm For Thoroughly Assembling Very Short Reads
1
1
Entering edit mode
11.1 years ago
JacobS ▴ 990

I am looking for a painless method for conducting a very small assembly of short sequences based on exact identity. Simply put, I have an NGS sample that I believe is contaminated with a common sequence. I scanned a few million reads and determined the top 50 most abundant kmers of length 25nt. Browsing these top 50 kmers, it is clear that they are mostly staggered windows of a single sequence, and I would like to assemble these 50 kmers by overlapping identity.

Short of writing a perl script, does someone know of a simple way to do this? Thanks!

assembly • 2.1k views
ADD COMMENT
2
Entering edit mode
11.1 years ago
Torst ▴ 980

So you have 50 sequences, each of 25bp length, and you believe them to be highly overlapping with virtually 100% identity representing a parent sequence of about 75bp or so?

The simplest thing to is to a multiple sequence alignment (MSA) of the 50 sequences. The consensus sequence will be your contaminant sequence. This is a "poor man's" de novo assembly but fits your situation well.

To do the MSA you can use clustal-omega:

clustalo -i kmers.fasta > kmers.aln

To get the consensus, you can use 'cons' from EMBOSS:

cons -plurality 0 -sequence kmers.aln -outseq contaminant.fasta
ADD COMMENT
0
Entering edit mode

Hi @Torst, thanks for your descriptive explanation! While it certainly solves the problem, I should explain that I am more interested in finding a simple assembler for solving this problem. I would actually like to use such an assembler on the top 500 kmers, which will likely constitute 10 reference seqs, which would hopefully assemble into 10 different kmers. Furthermore, the reads may be from different strands, and I could have top kmers that are inverse-complements of the other kmers, so I would want to assemble while considering every possible orientation. Am I wrong in assuming it would be tedious to complete such a task using clustal-omega?

ADD REPLY
1
Entering edit mode

CAP3 would do a good job, but it will need a few parameters tweaked for your situation:

http://seq.cs.iastate.edu/

ADD REPLY

Login before adding your answer.

Traffic: 826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6