Abyss-pe: contigs and scaffolds are identical
2
0
Entering edit mode
4.1 years ago
joy72511 • 0

Hi,

I have run Abyss-pe (v2.2.4) with different kmers (31-69) using illumina reads (2*150). But all of them have identical contigs.fa and scaffold.fa. Is this normal?

Thanks for your help.

Joy

abyss-pe command:

nohup abyss-pe k=51 v=-v name=x32 in='../trimmed/Xylem32_R1_trimmed.fq ../trimmed/Xylem32_R2_trimmed.fq' &> a32.51.oe &

abyss-fac:

abyss-fac   x32-unitigs.fa x32-contigs.fa x32-scaffolds.fa |tee x32-stats.tab

n   n:500   L50 min N75 N50 N25 E-size  max sum name

2404872 9439    3962    500 536 585 693 683 6526    5884968 x32-unitigs.fa

2404832 9431    3957    500 536 585 694 687 6526    5888336 x32-contigs.fa

2404832 9431    3957    500 536 585 694 687 6526    5888336 x32-scaffolds.fa

abyss-map :

abyss-map -v  -j2 -l40    ../trimmed/Xylem32_R1_trimmed.fq ../trimmed/Xylem32_R2_trimmed.fq x32-6.fa \
    |abyss-fixmate -v  -l40  -h x32-6.hist \
    |sort -snk3 -k4 \
    |DistanceEst -v  --dot --median -j2 -k51  -l40 -s1000 -n10  -o x32-6.dist.dot x32-6.hist
Reading from standard input...
Reading `x32-6.fa'...
Using 202 MB of memory and 83.9 B/sequence.
Reading `x32-6.fa'...
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Read 286 MB in 2404832 contigs.
Using 2.71 GB of memory and 9.49 B/bp.
Read 1000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 2000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 3000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 4000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 5000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 6000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 7000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 8000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 9000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 10000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 11000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 12000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 13000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 14000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 15000000 alignments. Hash load: 0 / 4 = 0 using 369 kB.
Read 16000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.
Read 17000000 alignments. Hash load: 2 / 4 = 0.5 using 369 kB.

Mapped 15254206 of 17394198 reads (87.7%)

Mapped 13611151 of 17394198 reads uniquely (78.3%)

Read 17394198 alignments

Mateless         0

Unaligned   683759  7.86%

Singleton   772474  8.88%

FR         2758829  31.7%

RF              61  0.000701%

FF              21  0.000241%

Different  4481955  51.5%

Total      8697099
abyss assembly abyss-pe • 1.6k views
ADD COMMENT
0
Entering edit mode

Usually this is not normal behaviour indeed (though it can happen).

before I can give a conclusive answer: could you post the complete run log of the abyss pipeline?

and do you mean each Kmer gave the same result or for each kmer the contig and scaffold gave the same result?

Can we correctly assume you're working with genome data btw? (and thus not transcriptome?)

ADD REPLY
0
Entering edit mode

Thanks for your reply. The complete run log is too many words to post. I don't know how to do this. But I also posted the question on google groups. That platform can carry files ([https://groups.google.com/forum/#!topic/abyss-users/SyTgYAj_iDU]). Different Kmer gave different result. But all of the results had same contigs and scaffolds. The data is genome data captured by probe designed by transcriptome.

ADD REPLY
0
Entering edit mode

I see, and I had a look at the google group post as well.

what Lauren mentioned there is exactly what I was referring to as well. (and would also have been my suggestion).

Concerning your data: so this is not a full genome WGS dataset? but some captured data? if so, it's not surprising to have such low stats. For an average conifer genome (and I do have quite some experience in that) the assembly result is very very small, like 1000 - 5000 times too small.

can you confirm again that you are doing genome assembly and not transcriptome assembly?

ADD REPLY
0
Entering edit mode

I'm not sure what you mean about genome assembly you mentioned.Does it mean that this sequence is used to assemble whole genomes? I didn't have the budget to do whole genome assembly. The data are sequenced from reduced representation libraries. The libraries are gDNA and captured by probes. I assembly the sequence for calling variant. Thank you for your advice and help.

ADD REPLY
0
Entering edit mode

So you have a reference genome?

ADD REPLY
0
Entering edit mode
4.1 years ago
Mensur Dlakic ★ 27k

It is normal to have the same number of contigs and scaffolds at the end of assembly.

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

ADD REPLY
0
Entering edit mode
4.1 years ago
h.mon 35k

The fact the contigs and scaffolds are identical suggests the average insert size of the library is too short: probably most paired reads overlap, and there is no "jumping" information available. I don`t have experience with ABySS, but SPAdes is able to scaffold a few contigs, if the insert size is not too small.

On a side note: what are you assembling? The sum of the contig lengths suggests it is a bacterial genome, if this is the case, you have a very, very poor assembly.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. The insert size indeed smaller than I set. I assumed it would be 200 bp, but it only has less than 100 bp. The data is a conifer genome data captured by probe designed by transcriptome.

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6