Question

Some explanation about what a paired-end sequencing really means

2

Entering edit mode

5.9 years ago

Sus ▴ 40

Hi ! I'm currently a student and I have a hard time understanding some basics of bioinformatics; I'm currently learning about alignment, filtering, variant calling and such so my question might look silly but here it is anyway.

I have some trouble about how you work with paired-end sequencing files and what does it means to be paired-end.

After taking a look on the Internet I found an explanation of what is paired-end sequencing (tell me if I got it right):

For what I understood, a paired-end sequencing is just done by sequencing from A to Z and then from Z to A. Which will provide two distinct datasets, one for each direction.

My question is, when you are doing some alignment with tools like BWA, TopHat or whatever, do you have to reverse one of the two dataset or not ? Because, for instance, If I wanted to find a consensus sequence (or the position specific score matrices), if half of the data are in the wrong direction wouldn't it be completely wrong ?

Completely unrelated: I've also heard that TopHat should be used over BWA for aligning RNA, do you know why ?

alignment paired-end • 27k views

ADD COMMENT • link updated 5.9 years ago by andrew.j.skelton73 6.5k • written 5.9 years ago by Sus ▴ 40

0

Entering edit mode

5.9 years ago

h.mon 35k

Most programs already take into account paired-end read orientation, you have to read the documentation carefully program-by-program.

Completely unrelated: I've also heard that TopHat should be used over BWA for aligning RNA, do you know why?

Don't use Tophat, there are several better programs, and it has been superseded by HISAT2 (from the same group of developers). BWA is not splice-aware, and Tophat is, hence Tophat is better for aligning RNAseq reads to a reference genome. But again, don't use Tophat.

ADD COMMENT • link 5.9 years ago by h.mon 35k

score 4 · Accepted Answer · 2018-05-10

Always easier to illustrate with an image, from here. The grey represents the fragment, and each end of the fragment is sequenced. This allows more accurate mapping, particularly of repetitive regions. There's also a great animation here that illustrates the concept of Illumina paired end sequencing. As @h.mon stated, most programs will have parameters to deal with paired-end sequencing, and seriously, stay away from Tophat. STAR or HISAT2 are current alternatives

enter image description here