What kind of advantage does PEAR bring?
1
0
Entering edit mode
4.8 years ago
CY ▴ 640

I have not had a chance to try PEAR.

I assume some people would like to merge PE reads into SE reads when most of the inset size is less than 2*read length.

2) What kind of advantage PEAR brings? I imagine the mapping accuracy won't change much. Structural variants are less accurately identified without pairing information. I can't imagine anything good out of this implementation...

pear paired end single end • 1.8k views
1
Entering edit mode
4.8 years ago
h.mon 34k

-o <str>    Specify the name to be used as base for the output files. PEAR outputs four files. A file containing the assembled reads with a assembled.fastq extension, two files containing the forward, resp. reverse, unassembled reads with extensions unassembled.forward.fastq, resp. unassembled.reverse.fastq, and a file containing the discarded reads with a discarded.fastq extension.


2) What kind of advantage PEAR brings?

The end of the reads generally has lower quality, by merging two lower quality ends, you increase the overall confidence for the overlapped bases. Merging reads is useful for processing amplicons shorter than sum of reads (imagine 16S metagenomics). Also some people claim it is also useful for assembly.

0
Entering edit mode
Also some people claim it is also useful for assembly.


:) Depends on the assembler, but yes, it can be very useful. It's also useful for identifying longer insertions when calling variants.

0
Entering edit mode

This is useful for identifying longer insertion only because the insert size is shorter than the sum of paired reads, right? Otherwise, The paired end reads plus the known insert size would provide more information when identifying long insertion. Am I right?

1
Entering edit mode

Specifically, a normal aligner can only call insertions in cigar strings if the insertion is shorter than read length, and a variant caller based on cigar strings will only call variants recorded there by the mapper. So the longer the reads, the longer the insertions that can be called. When paired reads have an insert size longer than read length but less than double read length, they can be merged by the overlap to produce a single longer read that allows longer insertions to be called.

0
Entering edit mode

I guess in best case the insert size should longer than the sum of paired end reads so that the insert size can server as additional information when calling structure variant (longer insertion).

When the interested region (amplicon) is short and insert size can't provide additional information, merging them into SE would be an alternative choice, right

1
Entering edit mode

Yes, it makes analysis much simpler too.