Question

Forum:Attempting to understand pair end sequencing, mainly illumina

0

Entering edit mode

9.9 years ago

th.tomhitch • 0

Hi I am currently attempted to understand how pair end sequencing works. I understand the basics that it sequences from both ends and if there is an overlapping region they can be joined to create longer single reads.

From my understanding would the example below be true, please tell me if it is or isnt.

Example:

So essentially the total seq is the whole of the fragment. This is pair end sequenced providing a forward read (seq1) and a reverse read (seq2). However seq2 will be the complement to seq1 due to way the sequencing occurs.

50 A, 50 C, 50 G
TotalSeq = ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG'

So seq1 will be sequenced:

----------------------------------------------------------------------------------------------------->

and seq2 will be sequenced:

<-----------------------------------------------------------------------------------------------------

This provides an area of overlap which will be the site of matching and merging.

50 A, 50 C, 1 G
seq1 = ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCG'

1 T, 50 G, 50 C
seq2 = 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT'

Any help would be greatly appreciated as I am currently attempting to code for a joiner which works fine on the above seq1 and seq2 but on real data absolutely fails, therefore I can only assume my understanding of pair end sequencing is flawed.

Thanks,
Tom

NGS illumina paired-end • 2.9k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 9.9 years ago by th.tomhitch • 0

0

Entering edit mode

Sorry yes I ment that seq2 was just complemented but are they correct in the sequencing?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by th.tomhitch • 0

0

Entering edit mode

Sequences are always 5' to 3', so seq2 needs to be reversed to match how it would occur in sequencing.

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Ok thanks, edited so it is correct now.

ADD REPLY • link 9.9 years ago by th.tomhitch • 0

score 1 · Answer 1 · 2014-06-05

1

Entering edit mode

9.9 years ago

Devon Ryan 104k

You did the complement for read #2, not the reverse complement. Read #2 would be 50 cytosines, followed by 50 guanines and then a single Thymine.

BTW, why write your own read merger when they already exist? Have a look at Flash.

ADD COMMENT • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

I am making my own merger as an attempt to learn python as I am new to computational biology.

ADD REPLY • link 9.9 years ago by th.tomhitch • 0

0

Entering edit mode

Ah, that makes more sense then. If you're learning python anyway, you may want to look into biopython, it provides a lot of nice bio-related functionality.

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Yeah I had a look into bio-python and seemed really useful, currently getting to grips with HTSeq as well which has been good.

ADD REPLY • link 9.9 years ago by th.tomhitch • 0

score 1 · Answer 2 · 2014-06-05

I would caution that you should understand the estimated fragment size for your real data. Many (most) times when a paired end sequencing library is made there is a size selection for fragments in a range - maybe centered around 400bp. If you have 100 or even 150 cycle paired end sequencing then most of the reads in your real data will not overlap.

Ram · Answer 3 · 2014-06-05

One of the most neglected aspects of NGS related education materials is explaining what exactly comes out of an instrument for a given DNA sample.

What kinds of transformations take place, what gets ligated to which end, how many times does a fragment get copied, which strands make it through the process, and then how many reads does each fragment produce. It is an unexpectedly complicated process.

More so since there is a broad proliferation of library preparation standards that may also introduce other elements.

The best way to understand it is to watch a video like this (find one that targets the library prep that you are interested in):