Question

Difference between genome assembly and genome sequence alignment to a reference to find structural variants 1kgp SGDP

0

Entering edit mode

3.8 years ago

m4r1n4 • 0

Hello,

I'm trying to determine what the difference and benefits of genome assembly and genome sequence alignments are when trying to identify structural variants or transpoosons in populations.
I've been scouring the internet but have only really come across the difference between short vs long reads and de novo assembly vs reference-based.

My understanding is that to identify variations in structural variants within a population there seems to be 2 main comparative genomic methods, the first being what the 1KGP and SDGP did and sequence the whole genome, align the reads to the reference genome and end up with a BAM file.

The second is to assemble personal genomes and then compare or align the assemblies to each other and the reference genome or using the Lastz/LiftOver/ChainNets Examples: 10.1016/j.gene.2005.09.031

Thanks in advance.

Assembly alignment transposon genome • 1.3k views

ADD COMMENT • link updated 3.8 years ago by shimbalama ▴ 10 • written 3.8 years ago by m4r1n4 • 0

score 1 · Answer 1 · 2020-07-26

HI,

Your question is a little unclear but I think I understand. You're talking about assembly Vs mapping for SV detection, right? To be clear, you can't do genome sequence alignments, except maybe for something as small as a virus like COIVD19. I was further confused, as a third option exists - to do MSA of reads that have been identified as chimeric (map to a SV breakpoint). People use mapping, MSA and assembly (whole genome or local) to understand SVs and all are valid. Differences and especially benefits come down to the specifics of the various algorithms. Generally though, the main benefit of assembly is that it is reference free, ie, no a priori bias always for the identification of novel SVs. However, I often map assemblies to a reference anyway (unless it completely novel seq) as it's useful to describe it based on known genomic loci.

I hope that helps.