I am working on a project testing the difference between two reference genomes. However, one of the most obvious differences is the level of the assemblies - one is chromosome level, the other is contig level. This doesn't create too much of an issue as I plan to find syntenic blocks, liftover annotations, or something to that effect. Mapping efficiency and genome-wide coverage are similar for both genomes.
More to the point, I am identifying copy number variable (CNV) regions, and I'm using Lumpy to do it, and Lumpy works directly on bam files. My unfiltered results indicate a problem; I generate 53Gb of data for the chromosome level assembly, but only 0.5Gb of data for the contig level assembly. There are no errors thrown.
Lumpy is fairly comprehensive by combining methods, but seems to primarily work through breakpoint detection. Is this method affected by size of contig?
If anyone has any insights into why I might be seeing this pattern, any help would be greatly appreciated.