Question: False positives in manta structural variant calling
0
gravatar for Angel
7 weeks ago by
Angel10
Angel10 wrote:

Hi guys,

I'm analyzing structural variants in paired tumor-healthy human samples with manta and I was surprised by the huge number of events identified by the software, particularly in terms of translocations (MantaBND). Just a couple examples:

gzcat somaticSV.vcf.gz | grep -v "^#" | grep -iw "PASS" | cut -f 3 | tr ":" "\t" | cut -f 1 | sort | uniq -c

Number of structural variants (PASS) for sample X:
8402 MantaBND
  50 MantaDEL
  63 MantaDUP
Number of structural variants (PASS) for sample Y:
21526 MantaBND
 128 MantaDEL
 162 MantaDUP
   1 MantaINS

So my first question is, are these numbers normal? For me it looks like something went wrong and I got lots of false positives, but I have limited experience doing this kind of analyses in human samples (I did them in the past in mouse tumor samples and numbers were much lower).

My second question is, how can I filter the data to get rid of the noise? I've read this, this and this, but I'm not sure whether I should combine the output of different callers, filter by number of supporting reads (based on what threshold?) or if there is other more "standard" way to clean the data. At this point visual inspection is obviously not an option.

Any input would be much appreciated, thank you!

Experimental details:

  • WES on paired healthy (PBMC) vs tumor samples.
  • 2x150 PE sequencing (average depth around 100x).

Analysis details:

  • Reads aligned with BWA-mem to GRCh38.95 (default parameters, just used -M for Picard compatibility and -R to pass header info).
  • Sorted bamfiles generated with samtools.
  • SV calling by manta 1.6.0 (precompiled version: manta-1.6.0.centos6_x86_64) on sorted bamfiles (I did not remove duplicates, could this be the source of the problem?).
  • Manta configuration: python2 manta-1.6.0.centos6_x86_64/bin/configManta.py --exome --normalBam $name-PBMC.sorted.bam --tumorBam $name-Tumor.sorted.bam --referenceFasta GRCh38.95/GRCh38.95.fa --runDir SV_20201028/$name.manta

  • Manta run: python2 $HOME/scratch/results/SV_20201028/$name.manta/runWorkflow.py -j 8 -g 64;

P.S. Quality-wise the samples look fine (fastqc).

P.S.2 I checked manta logfiles and I didn't see anything strange, but if you have an indication of what to look at I can recheck them.

ADD COMMENTlink written 7 weeks ago by Angel10

I think not only translocations can be shown as BND: https://www.biorxiv.org/content/10.1101/840751v1.full.pdf , Fig4.

ADD REPLYlink written 7 weeks ago by German.M.Demidov1.9k

Thank you. Even if filtering by "MantaBND" also includes domestic insertions, don't you think the numbers are very high?

ADD REPLYlink written 7 weeks ago by Angel10
1

I think I get similar numbers when I use it with Exomes. In Genomes it looks much more interpretable. Manta was not designed for exomes. Everything that Manta cannot correctly classify it puts as BND and there are many SVs that e.g. lack a second breakpoint coverage.

ADD REPLYlink written 7 weeks ago by German.M.Demidov1.9k

I see, I will try my best to filter out all irrelevant info. Thanks a lot for your input!

ADD REPLYlink written 7 weeks ago by Angel10

You may want to look at using an ensemble approach, employing manta alongside other SV calling tools. Would allow you to be more confident in any call identified by multiple tools

ADD REPLYlink written 7 weeks ago by samuel.a.odonnell70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2264 users visited in the last hour