Question

Effect of assembly quality on CNV detection

0

Entering edit mode

4.2 years ago

dthorbur ★ 1.9k

I am working on a project testing the difference between two reference genomes. However, one of the most obvious differences is the level of the assemblies - one is chromosome level, the other is contig level. This doesn't create too much of an issue as I plan to find syntenic blocks, liftover annotations, or something to that effect. Mapping efficiency and genome-wide coverage are similar for both genomes.

More to the point, I am identifying copy number variable (CNV) regions, and I'm using Lumpy to do it, and Lumpy works directly on bam files. My unfiltered results indicate a problem; I generate 53Gb of data for the chromosome level assembly, but only 0.5Gb of data for the contig level assembly. There are no errors thrown.

Lumpy is fairly comprehensive by combining methods, but seems to primarily work through breakpoint detection. Is this method affected by size of contig?

If anyone has any insights into why I might be seeing this pattern, any help would be greatly appreciated.

CNV lumpy • 831 views

ADD COMMENT • link 4.2 years ago by dthorbur ★ 1.9k

score 0 · Answer 1 · 2020-02-10

Well, if anyone else finds this issue, albeit fairly niche considering what I'm doing, the smaller contigs likely affect the discovery of discordant and split reads. My solution at this moment is to try and guide assembly of one reference genome using the other through a synteny analysis. I'm using Satsuma (http://satsuma.sourceforge.net/) using their Chromosemble function, but also then running an exhaustive SatsumaSynteny analysis too.