Question: Determining Paired-end or Mate-pair insert length in De Novo Sequencing
0
gravatar for sameedmsiddiqui
2.9 years ago by
sameedmsiddiqui0 wrote:

Hello,

I have a few questions regarding mate-pair and paired end sequences:

  1. Does one have to know the exact length of the insert for the paired end or mate-pair sequences to be useful? (I'm not sure if "insert length" is the proper term to use in mate-pairs, but it seems to be the same concept as insert length in paired ends unless I am mistaken).
  2. From what I have read, usually, people obtain the length of the insert by aligning the paired end to a reference genome. Doesn't that kind of defeat the whole purpose or usefulness of the paired-end sequences, because to align to a reference genome, you generally have to treat the paired-end as two single reads. Or do you also have some knowledge of the *approximate* length of the insert (in which case, I can see the usefulness)?
  3. How is this done in de novo sequencing, when you don't even have a reference sequence?

Thanks a bunch!

mate pair paired end de novo • 1.3k views
ADD COMMENTlink modified 2.9 years ago by Charles Plessy2.7k • written 2.9 years ago by sameedmsiddiqui0
1
gravatar for Biogeek
2.9 years ago by
Biogeek350
Biogeek350 wrote:

Hey,

You can use a nice package by a guy who is on here - Brian Bushnell (if I remember correctly). It's called BBmerge and if you google it, you can find out the syntax for calculating paired end insert size for reads - to my mind it doesn't need a reference genome. I had to do this a few weeks ago when I was trying out SOAPtrans which asked for an insert size.

Hope that helps you.

ADD COMMENTlink written 2.9 years ago by Biogeek350

Hi,

Thanks! I'll make sure to check it out.

Regards.

ADD REPLYlink written 2.9 years ago by sameedmsiddiqui0
0
gravatar for Asaf
2.9 years ago by
Asaf6.0k
Israel
Asaf6.0k wrote:
  1. There is no exact size, it's a distribution. This distribution is useful to know how many N's to insert when scaffolding.
  2. The de-novo assembly is usually done without the knowledge of paired-end, each side is treated as if it's a single end to generate contigs. You should get long enough contigs to be able to map both ends of a fragment to estimate insert size (even with mate-pair).
ADD COMMENTlink written 2.9 years ago by Asaf6.0k

Hi. Thanks a bunch. So in response to (1), I wonder how paired-ends can be useful to align to repetitive regions in the reference genome, for example when one end of the pair is in a non-repetitive region and the other is in the repetitive region? On one hand, it seems that precision in aligning within repetitive regions might not be as important, but if an SNP repeatedly occurs in a specific region of a repetitive region, then precise alignment would be useful to determine where exactly the mutation is. Thanks once again!

ADD REPLYlink written 2.9 years ago by sameedmsiddiqui0
0
gravatar for Charles Plessy
2.9 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

1) Does one have to know the exact length of the insert for the paired end or mate-pair sequences to be useful?

It is important to have an estimate, so that the aligner can distinguish between _"proper"_ pairs that are likely to truely represent the molecule they originate from, and the artefacts where one mate is misaligned, usually very far from the other mate. How much "far" means depends on the method. For instance, in transcriptome sequencing, it is expected that some proper pairs will align hundreds of kilobases apart, and short read aligners such as BWA need to know that.

2) ... do you also have some knowledge of the approximate length of the insert ?

First, as explained above, the distribution observe lengths after alignment will differ according to the kind of sequencing method (transcriptome, genome, ...). In addition, for genome sequencing, the sequencing templates can be prepared in such a way that the distance after alignment should be within a given range.

3) How is this done in de novo sequencing...

De novo assembly typically takes advantage of the prior information on what the distance between the mates should be, in order to sort the contigs, predict gap size, etc.

ADD COMMENTlink written 2.9 years ago by Charles Plessy2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1522 users visited in the last hour