Following on from some advice I received in a previous post, I've attempted to use
vg surject to map long reads to the T2T reference genome, and it seems to be resulting in nonsense outputs - a much smaller BAM than a comparable linearly aligned BAM from
minimap2 but with 16x more alignments (631.8 million versus 39.6 million, from a source fasta containing 29.8 million reads) and yet when I attempt to call SVs on it using
cuteSV I get zero results. This makes no sense, so I'm wondering what exactly is going on here?
I'm fairly certain it's not
cuteSV, they did successfully call variants when used on the output from
minimap2 based on the same input data. I'm not sure what I can adjust on the
vg surject front, I'm using default settings (and in the case of the former, its presets for
vg-style graph inputs). The one thing I know is going on is that
vg surject is hitting a hard limit on alignment size on some subgraphs and stopping early for some reads - how many, I don't know, because it refuses to give further warnings past the first.
My pipeline (with the data sources and software versions listed) is here, and a sample error log from the
vg surject section of the process is here. I'd appreciate advice from anyone who has more experience with these tools - have a sneaking suspicion that I'm running into the same issue of "these tools can't quite do what you're asking of them" from before.