Question

A walk from mutation to structural variation and assembly.

0

Entering edit mode

7.6 years ago

krati.sharma • 0

I have a concern on Gene Expression, best sequencing technique for finding structural variation, shot gun sequencing and de Brujin Graph. Please help me to understand better and provide supporting material / references on this.

Gene expression -

1) How might mutation that affect DNA Curvature in a promoter region can affect downstream gene expression? I am looking for explanation or supporting references on the following points.

DNA polymerase or slippage (can cause both)
Thymine dimer formation (can cause both, stacking energy)
Change in Hydrophobic region or pie clouds of base nucleotides (can cause both, stacking energy)
Wobble bases and mismatching. (can cause both, stacking energy)

2) Sequencing for finding structural variation- PacBio is best for detecting structural variation but error rate is high. If I have exome which is 1% of Whole Genome and in the case I am looking for rare structural variants (a std. size for structural variation is 1kb), I would not like to go with PACBIO since it is 10% error. Can I go with Ilumina in this case? Short reads is the disadvantage but error handling rate is .1% or so (not sure but comparatively less). Could you please provide reference in support of Ilumina?

3) On repeating shot gun sequencing on a genome, will hit rate for finding sequence be the same? How repeated sequences play role in it?

4) Why are de Bruijin Graph used in genome assembly after shot gun instead looking for regions of overlap in assembling a genome? I say that the graph has high coverage and can provide new information. But can I know more on this?

genome sequencing Assembly • 1.4k views

ADD COMMENT • link updated 7.6 years ago by Steven Lakin ★ 1.8k • written 7.6 years ago by krati.sharma • 0

0

Entering edit mode

Please limit your post to one question (or a few questions related to the same subject).

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2016-09-21

To answer #4, overlap layout consensus (OLC) algorithms are known to be NP-complete by reduction to the Hamiltonian Path Problem. They could still be used for long fragment assembly, but most of the data being generated now are from short read sequencing technologies. De Bruijn graph (DBG) assembly is a Eulerian path problem, for which linear time algorithms exist, so it is much more computationally tractable for large numbers of vertices in the assembly graph. There's more to it than this, but that is the gist of why we use DBG over OLC. For further reading, the original paper by Pevzner, Tang, and Waterman is nice, as is a more recent review of the algorithmic differences.