Abyss output scaffolds.fa file contains kmer sized scaffolds
1
0
Entering edit mode
7 months ago
analyst ▴ 70

Dear all,

I have run abyss on my short reads plant data with kmer size 37 (got through kmergenie). The scaffolds.fa file contains kmer sized scaffolds. Should not kmer assembled into contigs and contigs into large sized scaffolds?

I am putting first few lines of scaffolds.fa file here as:

>0 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
CCAGAGCATCTACTAGCAACGGAGAGCATGCAAGATC
>2 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
AGAGCATCTACTAGCAACGGAGAGCATGCAAGATCAC
>4 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
AGCATCTACTAGCAACGGAGAGCATGCAAGATCACAA
>8 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
TCTACTAGCAACGGAGAGCATGCAAGATCACAAATAA
>9 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
CTACTAGCAACGGAGAGCATGCAAGATCACAAATAAC
>11 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
ACTAGCAACGGAGAGCATGCAAGATCACAAATAACAT
>17 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
AACGGAGAGCATGCAAGATCACAAATAACATATGATA
>21 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
GAGAGCATGCAAGATCACAAATAACATATGATAAATA
>27 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
ATGCAAGATCACAAATAACATATGATAAATAAATAAT
>31 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
AAGATCACAAATAACATATGATAAATAAATAATTGAT
>32 37 255 read:bn,LH00330:195:227VG7LT4:6:1103:0:451604/1
AGATCACAAATAACATATGATAAATAAATAATTGATC
>36 37 255 read:bn,LH00330:195:227VG7LT4:6:1117:0:5202810/1
CCATCGAGGTATCCCCTACGACCAACTCCAAATATAG
>38 37 255 read:bn,LH00330:195:227VG7LT4:6:1117:0:5202810/1
CATCGAGGTATCCCCTACGACCAACTCCAAATATAGC
scaffolds abyss • 658 views
ADD COMMENT
3
Entering edit mode
7 months ago

Hi,

having quite some experience with ABySS I can tell this is expected behavior/output. I do understand the worry though, I had the same in the beginning, especially considering that the assembled output can thus be shorter than the input read length

Since ABySS is an assembler using the kmer approach it will output each kmer that is unique, thus yes the shortest sequence it will return is your chosen kmer-size.

Indeed, kmer should and are assembled into contigs and those contigs are then assembled into scaffolds. This is given that is it possible. If a contig (or contigs) can not be assembled into scaffolds it will be outputted as a contigs as such, and same goes for the kmers (if they can not be assembled into contigs they will be reported as such).

I fully understand it looks a bit funky but it is explainable :) . What I usually did is to do some length filtering on the resulting assembly. Usually I took something like the input read length as a threshold, often even something like approx twice the read length (as I consider that the "minimum length" you can assemble == merge 2 reads)

ADD COMMENT

Login before adding your answer.

Traffic: 2884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6