False positives in manta structural variant calling
0
0
Entering edit mode
4 months ago
Angel ▴ 10

Hi guys,

I'm analyzing structural variants in paired tumor-healthy human samples with manta and I was surprised by the huge number of events identified by the software, particularly in terms of translocations (MantaBND). Just a couple examples:

gzcat somaticSV.vcf.gz | grep -v "^#" | grep -iw "PASS" | cut -f 3 | tr ":" "\t" | cut -f 1 | sort | uniq -c

Number of structural variants (PASS) for sample X:
8402 MantaBND
Number of structural variants (PASS) for sample Y:
21526 MantaBND
1 MantaINS


So my first question is, are these numbers normal? For me it looks like something went wrong and I got lots of false positives, but I have limited experience doing this kind of analyses in human samples (I did them in the past in mouse tumor samples and numbers were much lower).

My second question is, how can I filter the data to get rid of the noise? I've read this, this and this, but I'm not sure whether I should combine the output of different callers, filter by number of supporting reads (based on what threshold?) or if there is other more "standard" way to clean the data. At this point visual inspection is obviously not an option.

Any input would be much appreciated, thank you!

Experimental details:

• WES on paired healthy (PBMC) vs tumor samples.
• 2x150 PE sequencing (average depth around 100x).

Analysis details:

• Reads aligned with BWA-mem to GRCh38.95 (default parameters, just used -M for Picard compatibility and -R to pass header info).
• Sorted bamfiles generated with samtools.
• SV calling by manta 1.6.0 (precompiled version: manta-1.6.0.centos6_x86_64) on sorted bamfiles (I did not remove duplicates, could this be the source of the problem?).
• Manta configuration: python2 manta-1.6.0.centos6_x86_64/bin/configManta.py --exome --normalBam $name-PBMC.sorted.bam --tumorBam$name-Tumor.sorted.bam --referenceFasta GRCh38.95/GRCh38.95.fa --runDir SV_20201028/$name.manta • Manta run: python2$HOME/scratch/results/SV_20201028/\$name.manta/runWorkflow.py -j 8 -g 64;

P.S. Quality-wise the samples look fine (fastqc).

P.S.2 I checked manta logfiles and I didn't see anything strange, but if you have an indication of what to look at I can recheck them.

manta sv structural variants WES • 302 views
0
Entering edit mode

I think not only translocations can be shown as BND: https://www.biorxiv.org/content/10.1101/840751v1.full.pdf , Fig4.

0
Entering edit mode

Thank you. Even if filtering by "MantaBND" also includes domestic insertions, don't you think the numbers are very high?

1
Entering edit mode

I think I get similar numbers when I use it with Exomes. In Genomes it looks much more interpretable. Manta was not designed for exomes. Everything that Manta cannot correctly classify it puts as BND and there are many SVs that e.g. lack a second breakpoint coverage.

0
Entering edit mode

I see, I will try my best to filter out all irrelevant info. Thanks a lot for your input!

0
Entering edit mode

You may want to look at using an ensemble approach, employing manta alongside other SV calling tools. Would allow you to be more confident in any call identified by multiple tools