Usefulness of paired-end sequencing (noob question)
1
0
Entering edit mode
13 months ago
Sam ▴ 170

Usually it is explained, that paired-end sequencing is useful for determining with more certainty where the reads come from (which can be useful if we have many repeat regions).

However, TopHat2

For paired-end reads, TopHat2 processes the two reads separately through the same mapping stages described above. In the final stage, the independently aligned reads are analyzed together to produce paired alignments, taking into consideration additional factors including fragment length and orientation.

1) If the reads are aligned separately, doesn't that contradict the basic idea of paired-end sequencing (to get a more precise mapping) ??

2) Perhaps what I lack to know the answer is an understanding of the final stage. It'd be great if someone could refer me to an explanation. I didn't find one in the forums.

RNA-Seq paired-end tophat • 517 views
0
Entering edit mode

Paied-end sequencing is sampling two ends of a fragment. If you have a reference sequence available then mapping those reads on to that reference gives you a precise idea of the length of that fragment. If one (or both) ends are multi-mapping then there is no way to precisely figure out where that fragment came from. If your fragment happens to capture a splice site or a breakpoint then this can also give you useful information about those events.

There is an implied assumption that fragments going into the libraries are of a certain length. This assumption is used by some of the aligners to decide the orientation as well as "properly" mated length of those two reads.

If your paired-end reads overlap then you could simply merge them and use that single representation. Some aligners may have trouble if your reads completely overlap and extend into adapter sequences (short inserts).

0
Entering edit mode
13 months ago
ATpoint 54k
• PE is useful because it allows to capture exon-exon boundaries even if one single read does not span it.
• quality seems to be better when doing 2x150 rather than 1x300 on most machines. In fact I think only the MiSeq does 300bp reads, and by anecdotal evidence quality is not super good towards the end of the reads, so PE increases coverage
• it determines the exact fragment length and therefore allows modeling of the GC bias in RNA-seq, see here, and insert size length may serve as a quality control metric, e.g. in ATAC-seq where we expect a certain fragment size distribution
• it may improve alignment performance in difficult genomic regions

Having said that, it might be useful but not essential. Most analysis can be run with either SE or PE data. Please also use the search function, there are a lot of threads on this.

0
Entering edit mode

How can PE improve alignment performance, if the two reads are aligned separately (in the case of TopHat2)?

0
Entering edit mode

From your own question:

In the final stage, the independently aligned reads are analyzed together to produce paired alignments, taking into consideration additional factors including fragment length and orientation.

So if an alignment where each single read is valid turns out to be non-sense e.g. very long insert sizes, it probably gets discarded. bwa mem does something similar afaik.