I'm analyzing structural variants in paired tumor-healthy human samples with manta and I was surprised by the huge number of events identified by the software, particularly in terms of translocations (MantaBND). Just a couple examples:
gzcat somaticSV.vcf.gz | grep -v "^#" | grep -iw "PASS" | cut -f 3 | tr ":" "\t" | cut -f 1 | sort | uniq -c
Number of structural variants (PASS) for sample X: 8402 MantaBND 50 MantaDEL 63 MantaDUP Number of structural variants (PASS) for sample Y: 21526 MantaBND 128 MantaDEL 162 MantaDUP 1 MantaINS
So my first question is, are these numbers normal? For me it looks like something went wrong and I got lots of false positives, but I have limited experience doing this kind of analyses in human samples (I did them in the past in mouse tumor samples and numbers were much lower).
My second question is, how can I filter the data to get rid of the noise? I've read this, this and this, but I'm not sure whether I should combine the output of different callers, filter by number of supporting reads (based on what threshold?) or if there is other more "standard" way to clean the data. At this point visual inspection is obviously not an option.
Any input would be much appreciated, thank you!
- WES on paired healthy (PBMC) vs tumor samples.
- 2x150 PE sequencing (average depth around 100x).
- Reads aligned with BWA-mem to GRCh38.95 (default parameters, just used -M for Picard compatibility and -R to pass header info).
- Sorted bamfiles generated with samtools.
- SV calling by manta 1.6.0 (precompiled version: manta-1.6.0.centos6_x86_64) on sorted bamfiles (I did not remove duplicates, could this be the source of the problem?).
python2 manta-1.6.0.centos6_x86_64/bin/configManta.py --exome --normalBam $name-PBMC.sorted.bam --tumorBam $name-Tumor.sorted.bam --referenceFasta GRCh38.95/GRCh38.95.fa --runDir SV_20201028/$name.manta
python2 $HOME/scratch/results/SV_20201028/$name.manta/runWorkflow.py -j 8 -g 64;
P.S. Quality-wise the samples look fine (fastqc).
P.S.2 I checked manta logfiles and I didn't see anything strange, but if you have an indication of what to look at I can recheck them.