Question

Best experimental/bioinfo protocol to genotype large insertions?

2

Entering edit mode

8.6 years ago

Aurelie MLB ▴ 360

Hello,

I am trying to find what would be the best way to identify a large insertion (~1kb) in a sequence.

Basically, scientists I am working with wants to make a large insertion a genome and then double check that it is there indeed and what are the exact resulting sequences. Because the repair mechanism that is used is a bit random, the limits of the inserts can vary from one chromosome to another. The insertion might have not taken place and I can have a deletion instead on one of the chromosome. Moreover, the experiment will be done on the pool of cells, so we can end up with a pool of insertion/deletion with nested edges (which is not very well supported by variant callers even for smaller deletion/insertion it seems).

The scientists I am working with would like the sequences and the frequencies of the alleles. Outputting the sequences also mean to be able to match the limits of insertions/deletions (the repair mechanism will be random at both ends). So somehow, I have to have a physical link between the edges of my large insertion.

I am not quite sure what to advice in terms of experimental design: maybe sequencing a large amplicon and circularise it so that the edges of the insertion can be physically linked by paired-end reads? Then randomly shear and sequence...

But then, how would I align this? Against what?

I had a look at Structural variants callers but I am afraid they will not handle well populations of sequences and nested insertion/deletions. Any experience on this front please? I have seen http://www.broadinstitute.org/software/genomestrip/download-genome-strip but it does not handle insertions it seems.

I was advised to assemble (rather than aligning directly) using SPADes. I know nothing about assembling genome. Could it be the way forward?

Any insight from the community would really help!

Thanks a lot!

next-gen Assembly sequence alignment • 2.6k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Aurelie MLB ▴ 360

0

Entering edit mode

How many samples do they want to sequence? In general, doing a few rounds of PCR (first with primers that should only bind to the insert and then adding random primers in) to make amplicons partly covering the ends of the insert could get the job done.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Ryan,

Apologies I missed your answer ! It could easily become more than a hundred of samples. And actually I would have a similar question for large deletions finally.

PCR is good for a quick first check. But I believe it does not give a precise information on what are the resulting sequences (given a random mechanism for repair) and what are the allele frequencies unfortunately...

I kept on looking at Structural variant discovery and assembly algo. I fear that my biggest problem is nested insertion/deletion in a sample made of a pool of cells... (as it seems that in existing algorithms, structural variants that are too close to another one already reported are filtered out )

But without data to try stuff for real and without experience in this field so far, I am struggling to get a clear picture of what is the best route.

Thanks!

ADD REPLY • link 8.6 years ago by Aurelie MLB ▴ 360

0

Entering edit mode

PCR generates the amplicons that can then be sequenced. That allows you to resolve the insertion site nicely. This is also how this would have been done back in the Sanger sequencing days.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k