Question: Assemble short reads based on k-mers
0
gravatar for venu
5.4 years ago by
venu6.7k
Germany
venu6.7k wrote:

Hello all,

I am completely new for this kind of tasks. I have data like this,

>in0
GATCCTCGAAGTTACACGGG
>in1
TACGTCGACGTCAATCCGGG
>in2
TACACGGGCCGCTCCTGGGC
>in3
ACGGGGTACTACGAGACGCG
>in4
AGGGGGAATGTGGTCCACAT
>in5
TCCACATGGCTTGCTCCTGA
>in6
CTTGACGTTATGAATTTCGC

and so on..I need to assemble these short reads. I want to use perl for this. I just need a pseudo code on how to do this or direct me to a good resource. At the end I need a single string containing consensus sequence.

k-mer assembly perl • 1.5k views
ADD COMMENTlink modified 5.4 years ago by thackl2.8k • written 5.4 years ago by venu6.7k
2

Is their a reason you want to reinvent the assembly wheel? There are a good number of assemblers already written, why bother writing yet another one without a good reason?

ADD REPLYlink written 5.4 years ago by Devon Ryan97k

Can you give me some examples, so that I can find them directly on the internet.

ADD REPLYlink written 5.4 years ago by venu6.7k
2

You could try google

ADD REPLYlink modified 5.4 years ago by Devon Ryan97k • written 5.4 years ago by dylan.storey60

As orange said, SOAPdenovo is one option. Others would include Trinity and Minia. There are quite a few of these if you just search pubmed for "DNA assembler" or "DNA assemble".

ADD REPLYlink written 5.4 years ago by Devon Ryan97k
0
gravatar for orange
5.4 years ago by
orange30
Korea, Republic Of
orange30 wrote:

why not try SOAPdenovo instead of perl ? Did you not aseembly genome sequence before ?

ADD COMMENTlink written 5.4 years ago by orange30
0
gravatar for thackl
5.4 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

Assuming you want to stick to Perl for educational purpose:

Here is some code to quite efficiently create kmers with Perl: https://github.com/thackl/perl5lib-Kmer/blob/master/lib/Kmer.pm.

Perl is not really made for handling graph structures, but there is one module that you could use to set up a De-Bruijn structure: http://search.cpan.org/~jhi/Graph-0.96/lib/Graph.pod. I played around with it some time ago but did not follow through.

ADD COMMENTlink written 5.4 years ago by thackl2.8k

I just need a single string of consensus sequence from the above shown file

ADD REPLYlink written 5.4 years ago by venu6.7k

But definitely in Perl?

ADD REPLYlink written 5.4 years ago by thackl2.8k

Not exactly, but any simple program that receives the above file as input and outputs the consensus string. I can understand the perl code easily, so the tag.

ADD REPLYlink written 5.4 years ago by venu6.7k

But your data sets are small - you want to do some form of microassembly?

ADD REPLYlink written 5.4 years ago by thackl2.8k

Yes. I've 200 such reads in a file and I want an output like

 GGCATTTAACCGAAGCCGGTGGGTTAGACTATGATCCTCGAAGTTACACGGGCCGCTCCTGGGCGTGGCTGCTCCCAGCCCTAGCCCCAATGTAATATAAAGGTCGTGCCCAGTTAGCGTTAAGCAAGAGGTGTTACAAATATCTTGGAGAGTCATGTCGCAATTCTTGACGTTATGAATTTCGCGGTGAACAATGTCGCCCAGAATGGCAGGTCATGAAAAGCTTCAGCGGGAACCAGCAC....
ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by venu6.7k

What is the coverage of the reads and the length?

ADD REPLYlink written 5.4 years ago by thackl2.8k

I am doing this kind of work for the first time. What I know is each read has different lengths of k-mers. 

ADD REPLYlink written 5.4 years ago by venu6.7k
2

Take a look at this:

http://www.homolog.us/Tutorials/index.php?p=1.1&s=2

This should give you a basic overview about assemblies

ADD REPLYlink written 5.4 years ago by nterhoeven120
0
gravatar for thackl
5.4 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

You can use SPAdes:

spades.py --only-assembler --sc -k 33 -s in.fq -out asm
  # results are in asm/contigs.fa

This should also work for lowish coverage (5-10X) but assumes little to no errors in your data. Also, there can be multiple contigs and regions with very low coverage (just a 1-3 reads at a certain position, e.g. the ends of you target sequence) will be missing.

ADD COMMENTlink modified 12 months ago by _r_am31k • written 5.4 years ago by thackl2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1003 users visited in the last hour