Dear Abyss developers,
Background: I recently had success in using Abyss 2.0.2 to assemble my SE (25x), PE (25x) and MP (50x) reads into an assembly with scaffold N50 of 2Mb, which is relatively good. However, the unitig N50 is only 4 Kb, so lots of Ns are present in the sequences. It seems that abyss will just align PE and MP reads to the unitigs assembled by SE reads only, therefore a big proportion of PE and MP sequences are wasted (in the 'Different' category). For example, less than 40% of 10kb MP libraries can be aligned. With my very low SE coverage, I feel it is not very ideal to be starting material.
Approach: I am trying to concatenate all SE, PE and MP reads into "super-SE" reads to increase the unitig N50 and to improve subsequent PE and MP alignment efficiency. I have done very strict quality control of my MP reads to remove Nextera adaptors and transposes, so I don't think there are chimeras (defined as reads combining two fragments that are far apart). After constructing the super-SE reads by concatenating all fastq.gz files, I redid the assembly with the following command:
abyss-pe np=16 name=SWS k=66 pe='pe1' mp='mp1 mp2 mp3 mp4' \ se='SWS_super_SE.trimmomatic.fq.gz' \ pe1='SWS_PE_1.trimmomatic.fq.gz SWS_PE_2.trimmomatic.fq.gz' \ mp1='SWS_MP_1-4Kb_1.trimmomatic.fq.gz SWS_MP_1-4Kb_2.trimmomatic.fq.gz' \ mp2='SWS_MP_4-7Kb_1.trimmomatic.fq.gz SWS_MP_4-7Kb_2.trimmomatic.fq.gz' \ mp3='SWS_MP_7-10Kb_1.trimmomatic.fq.gz SWS_MP_7-10Kb_2.trimmomatic.fq.gz' \ mp4='SWS_MP_10-15Kb_1.trimmomatic.fq.gz SWS_MP_10-15Kb_2.trimmomatic.fq.gz'
Problem: Now it has taken several days to read the "super-SE" fastq.gz file. The following log is all I have got.
mpirun --mca btl_sm_use_knem 0 -np 16 ABYSS-P -k66 -q3
--coverage-hist=coverage.hist -s SWS-bubbles.fa -o SWS-1.fa SWS_super_SE.trimmomatic.fq.gz
ABYSS-P -k66 -q3 --coverage-hist=coverage.hist -s SWS-bubbles.fa -o SWS-1.fa SWS_super_SE.trimmomatic.fq.gz
Running on 16 processors
1: Running on host iw-k32-34
0: Running on host iw-k32-34
0: Reading `SWS_super_SE.trimmomatic.fq.gz'...
Troubleshooting: Based on my past experience with Abyss, it seems strange for it to take several days to read 80G fastq.gz files. There are several possible reasons I could think of:
PE and MP /1 and /2 reads have same read names (just one has 1 and the other has 2), so Abyss runs into some hashing problems for the super-SE. I therefore concatenated only /1 reads from PE and MP. However, the same issue persists.
Some problem with openmpi, which I have little knowledge in.
Any ideas what could have gone wrong? Thank you very much in advance!