Question

Bbmerger -lots of ambiguous reads

0

Entering edit mode

4.3 years ago

ja569116 • 0

Hi, I am doing a genome assembly and I read that merging reads can be beneficial to get longer contigs. I tried PEAR but it seems to have a limit in the number of reads, so I used BBmerge. So, I have PE-100 bp reads, and before merging, I did a run of error-correcting of a batch from a library using Musket, followed by merging with BBmerge. I got a lot of ambiguous - 93.544%, and the table of results is:

Pairs:                  91117090
Joined:                 5877355         6.450%
Ambiguous:              85234633        93.544%
No Solution:            5102            0.006%
Too Short:              0               0.000%
Fully Extended:         123997364       68.043%
Partly Extended:        33236534        18.238%
Not Extended:           25000282        13.719%

I was shocking because I thought that after error correction with Musket it would be beneficial, but it seems that I am wrong. So, I tried the uncorrected library, and the results were significantly different! Obviously, the N of pairs is the same, but many many more joined, more with no solution, less fully extended and Not extended.

Pairs:                  91117090
Joined:                 49327140        54.136%
Ambiguous:              41748942        45.819%
No Solution:            41008           0.045%
Too Short:              0               0.000%
Fully Extended:         110767592       60.783%
Partly Extended:        34111990        18.719%
Not Extended:           37354598        20.498%

The code for BBmerge is the following:

./bbmerge-auto.sh t=48 in1=/users/PHS0338/jpac1984/data/genome-myse/V300068047_L2_B5RDBATtnuRAAAAA-407_1.fq.gz \
        in2=/users/PHS0338/jpac1984/data/genome-myse/V300068047_L2_B5RDBATtnuRAAAAA-407_2.fq.gz \
        out=ori-merged.fq outu1=ori-unmerged1.fq outu2=ori-unmerged2.fq interleaved=false rem extend2=20 k=40 ihist=hist-ori.txt

Thanks for the hints and suggestions.

merging paired-end reads • 1.6k views

ADD COMMENT • link 4.3 years ago by ja569116 • 0

0

Entering edit mode

It would be unusual to have reads that overlap especially if you are trying to do a de novo assembly. You want longer fragments for that purpose. Did you specifically make the libraries with short inserts where you expect the two reads to overlap?

If you have a reference available you can try and align the reads to that reference using bbmap.sh. One of the metrics you will get out of that alignment will be average insert size of your libraries. You will likely find that the number will be longer than the length of your reads. Which will explain the result above.

I suggest you try to assemble the data as PE without trying to merge the reads. Let the assembler take care of the data.

ADD REPLY • link 4.3 years ago by GenoMax 152k