Question

Paired end for minia

0

Entering edit mode

5.5 years ago

alois.regl • 0

Hi,

I have two Illumina files (paired end). To be able to work with minia, I concatenated those two together. But the result is worse in comparison to using just one of the two. I expected it to be a lot better.

Maybe I should have reversecomplemented the second one? Or the first one? Or only complementing (not reversing)???

Any idea?

Thanks in advance, Alois

minia paired end • 1.9k views

ADD COMMENT • link updated 5.5 years ago by h.mon 35k • written 5.5 years ago by alois.regl • 0

0

Entering edit mode

Does minia help suggest that you concatenate the paired-end files like you did? That sounds like an odd way of treating PE data. Do you know if minia is able to accept paired-end reads as input?

Since minia documentation is just linking to their original paper no way to look that up easily.

ADD REPLY • link 5.5 years ago by GenoMax 152k

0

Entering edit mode

Don't concatenate the reads.

If you believe that the reads overlap use a tool that merges your paired-end reads into one.

ADD REPLY • link 5.5 years ago by Istvan Albert 102k

0

Entering edit mode

Ahhhh, my text could be misunderstood.

I did not concatenate the reads "side by side", I just appended the second file to the first one, so that I have twice as many single end reads. I dont know the insert size, so I have no clue if the reads overlap.

No, minia is not able to handel PE reads. This is why I tried to convert my PEs to SEs.

lg Alois

ADD REPLY • link 5.5 years ago by alois.regl • 0

0

Entering edit mode

This sounds like there may be contaminations and other errors in the data. As you add more data the number of systematic errors increases. It also often happens that the second in pair is worse in quality (though nowadays that is rarer).

Evaluate your data with FastQC, see if you need to clean it up, cut adapters.

Technically speaking treating paired-end data as single end should not make the assemblies worse - though with assemblies everything is always possible.

Try velvet as well if practical.

ADD REPLY • link 5.5 years ago by Istvan Albert 102k

score 0 · Answer 1 · 2020-01-08

First of all, according to the Minia manual (and my experience) you don't need to reverse-complement any reads, and you can can feed paired reads to Minia as is - you are correct, though, Minia won't use paired-end information at all..

What do you mean by:

But the result is worse in comparison to using just one of the two.

Is it more contigs? Worst N50? Worst BUSCO score? Worst alignment against a reference genome? How much worst? What is the estimated genome size and what is the sequencing coverage? What is the sequencing technology and read length? Did you try to assemble with R1 and with R2 reads, and both assemblies were better than with R1+R2?

I think you question lacks important information for a proper answer (such as the questions above, or Minia version, or the command used), so I will provide some tentative explanations.

One possible explanation is you have too much sequencing coverage, and several assemblers have degraded performance with too much data, as there are too many sequencing errors introduced (see, e.g., When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality).

Another explanation is your R2 reads have lower quality, and when you assembly with just higher quality R1 reads, the resulting assembly will be better.

Some other comments:

Minia can handle multiple files, you don't need to concatenate them (see Multipe Files under 5 Input for instructions)
The first sentence from Minia github repository is:

If you are looking to do high-quality genome or metagenome assemblies, please go here: https://github.com/GATB/gatb-minia-pipeline