Question

GATK GermlineCNVCaller & PostprocessGermlineCNVCalls

2

Entering edit mode

5.0 years ago

rajitz ▴ 20

Hi, I was wondering if anyone here has experience in running GATK GermlineCNVCaller & PostprocessGermlineCNVCalls for calling CNVs in germline samples?

The VCF files that I'm getting always have ALT to be "< DEL>,< DUP>". Shouldn't ALT be just one of them or neither? Somehow both the interval and segment VCF files I'm looking at have all positions marked as "< DEL>,< DUP>".

If anyone here has experience with this, I would really appreciate some feedback. Thanks!

software error gatk cnv • 2.7k views

ADD COMMENT • link updated 4.2 years ago by Z-F ▴ 20 • written 5.0 years ago by rajitz ▴ 20

score 1 · Answer 1 · 2019-09-11

Yes, this appears to be normal for the moment, I imagine it will probably change as the tool is further developed.

The information you're looking for is in the last column. The first element in that column, GT, stands for the call of expected ploidy (0), deletion (1) and duplication (2):

The following tutorial ends with a screen grab of what a typical gCNV VCF should look like: https://software.broadinstitute.org/gatk/documentation/article?id=11684

Undoubtedly, this is what your VCF looks like too. Hope this helps!

score 0 · Answer 2 · 2020-02-15

Hi everyone,

I am trying to use the CNV caller. a) GATK version used: gatk-4.1.4.0

I used the following command in this step.

../gatk-4.1.4.0/gatk -L Filtered_annotated_preprocessed_intervals_Twist.interval_list --interval-merging-rule OVERLAPPING_ONLY -I S1071Nr10.counts.hdf5 -I S1071Nr11.counts.hdf5 ( added 200 samples here as input, skipped those lines here to save the space) --contig-ploidy-priors ../contig_ploidy_priors.tsv --output . --output-prefix ploidy --verbosity DEBUG --mapping-error-rate 0.01 --global-psi-scale 0.001 --sample-psi-scale 1.0E-4 --mean-bias-standard-deviation 0.01

I installed the conda environment following https://gatk.broadinstitute.org/hc/en-us/articles/360035889851?flash_digest=f2aaedc26749c67b8005def080fde44460155fb6#

Everything was working until I got the following error, which I cannot understand what it is and how I can solve it.

16:54:47.473 DEBUG ScriptExecutor - --output_model_path=/data/NGS/Reanalysis-Package/CNV/ploidy-model /homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "/tmp/cohort_determine_ploidy_and_depth.1941148667013278511.py", line 79, in <module> args.contig_ploidy_prior_table) File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py", line 182, in get_contig_ploidy_prior_map_from_tsv_file delimiter=delimiter) File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 50, in read_csv input_pd = pd.read_csv(fh, delimiter=delimiter, dtype=dtypes_dict) # dtypes_dict keys may not be present File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 705, in parser_f return _read(filepath_or_buffer, kwds) File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 451, in _read data = parser.read(nrows) File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 1065, in read ret = self._engine.read(nrows) File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 1828, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 894, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 970, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 957, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2200, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 58, saw 7

16:54:55.812 DEBUG ScriptExecutor - Result: 1 16:54:55.813 INFO DetermineGermlineContigPloidy - Shutting down engine [February 3, 2020 4:54:55 PM IRST] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 0.78 minutes. Runtime.totalMemory()=3370123264 org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: python exited with 1 Command Line: python

So, it seems that the error is;

pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 58, saw 7

I googled a lot but I could not figure out what the problem is ( I have no experience working with python, I am just following the steps in here; https://gatkforums.broadinstitute.org/gatk/discussion/11684

Can anyone help me to solve the issue?

Thanks in advance,

Zohreh