Question

why the repeat sequences interfere with the assembly of the contigs

0

Entering edit mode

7.5 years ago

silvia.caprari84 ▴ 60

Hi , can you explain in a very very simple way why the repeat sequences interfere with the assembly of the contigs in the sequencing? I can't figure it out. And why the pair-end sequencing helps to overcome this problem?I read that this involves the sequencing of both ends of a fragment, ..but I can't figure out how this process facilitates the detection of genomic rearrangements, repeats etc.... Thanks

sequence Assembly sequencing next-gen • 3.1k views

ADD COMMENT • link 7.5 years ago by silvia.caprari84 ▴ 60

2

Entering edit mode

as simple as it looks repeat is a repeat of letters "AAAAAAAAAAAAAAAAAAAAAAAAAA" or any other repeat type, so if your genome have wide range of repeats and its length is longer than your read: for example if your read is 100 and your repeat is 1000., so where this 100 read will fit in the 1000 begin end or where? this is the problem.

how paire end will resolve it? assume that you have
ATATATAT ATATAATTGAAAGGAA

and you have paire end read first is ATAT second is AAGG with distance in between is 14 bp that will make it easier cause you know the distance between them and this will help you even if there is repeat

ATAT ATAT ATATAATTGAAAGGAA
ATAT- - - - - - - - - - - - - - AAGG

so in paire end read you have more info like distance and ordination of read.

more could be found here http://seqanswers.com/forums/showpost.php?p=1350&postcount=5

and I quote

Structural rearrangements can be deduced when your read pairs map to a reference at a distance that is substantially different from how that library was constructed (~500bp in the above example). Let's say you had two reads that mapped to your reference 1000bp apart...this suggests there has been a deletion between those two sequence reads within your genome. Same thing with an insertion, if your reads mapped 100bp apart on the reference, this suggests that your genome has an insertion.

ADD REPLY • link 7.5 years ago by Medhat 9.7k

score 2 · Answer 1 · 2016-10-19

Assemblers create graphs by overlapping the ends of reads to determine which sequence comes after what. See example below:

ATGGTCGATC                  --------------->  ATGGTCGATCGTGTAGCT
       ATCGTGTAGCT

Reads from repeat regions will have identical ends. Assemblers will often get confused in such cases because reads coming from such regions will have similar/identical ends.

A repeat region:    ATATATATATATATAT------ATATATATATATAT-----ATATATTATAT

Reads from repeat region: [1] ATATATATAT   [2] TATATATAT   [3] ATATATTATAT

Now, it is hard to decide whether to merge [1] and [2], or [2] and [3] or [3] and [1] or all of them. Any incorrect merging will lead to false assembly.

We lose 2 important piece of information:

How many repeat regions are actually there? Assemblers often merge repeats (false interpretation).
What was the location or order of repeats?
```
So how PE sequencing helps?
```

Paired-end sequencing reads from both ends of a DNA fragment, and is capable of pairing ends together -- so you know what's on the ends of your fragments, even if each individual read doesn't overlap with its mate. Also, we know the distance between pairs.

Now, when you sequence a repeat and align paired end reads on a region flanking a repeat, you can identify which repeat region, the reads belong to. See below

Image Courtesy: http://www.anthonybaldor.com

score 0 · Answer 2 · 2016-10-19

0

Entering edit mode

7.5 years ago

silvia.caprari84 ▴ 60

Thank you both of you for yourclear answers.One thing is still unclear. With the pair-end sequencing do you have the sequencing of the central part?I mean, your reads sequence the ends if I understood correctly..but what about the central parts?

ADD COMMENT • link 7.5 years ago by silvia.caprari84 ▴ 60

0

Entering edit mode

you should put this as comment not an answer.
for your question the middle part you do not have it in one paire, but other paire end sequence would full in this range so you will have it over all; for example:

ATAT ATAT ATATAATTGAAAGGAA
 ATAT- - - - - - - - - - - - - - AAGG  
     ATAT- - - - - - - - - - - - -- - -AA--  
         ATAT - - - -- - - - - - -- - -A --

and so on..

ADD REPLY • link 7.5 years ago by Medhat 9.7k

0

Entering edit mode

For paired-end sequencing, the central part is typically not sequenced. The average insert size is known (+/- some range), so the distance between the unique end and repeat end sequences is defined.

Note that the central part can be sequenced if the insert size is less than the length of the end reads. Then the reads overlap, and can be merged into a single longer read.

ADD REPLY • link 7.5 years ago by harold.smith.tarheel ★ 4.9k