Hi everyone,
I am trying to use the CNV caller.
a) GATK version used: gatk-4.1.4.0
I used the following command in this step.
../gatk-4.1.4.0/gatk -L
Filtered_annotated_preprocessed_intervals_Twist.interval_list
--interval-merging-rule OVERLAPPING_ONLY -I S1071Nr10.counts.hdf5 -I S1071Nr11.counts.hdf5 ( added 200 samples here as input, skipped those
lines here to save the space) --contig-ploidy-priors
../contig_ploidy_priors.tsv
--output . --output-prefix ploidy --verbosity DEBUG --mapping-error-rate 0.01 --global-psi-scale 0.001 --sample-psi-scale 1.0E-4 --mean-bias-standard-deviation 0.01
I installed the conda environment following https://gatk.broadinstitute.org/hc/en-us/articles/360035889851?flash_digest=f2aaedc26749c67b8005def080fde44460155fb6#
Everything was working until I got the following error, which I cannot understand what it is and how I can solve it.
16:54:47.473 DEBUG ScriptExecutor -
--output_model_path=/data/NGS/Reanalysis-Package/CNV/ploidy-model /homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/h5py/__init__.py:36:
FutureWarning: Conversion of the second argument of issubdtype from
float
to np.floating
is deprecated. In future, it will be treated
as np.float64 == np.dtype(float).type
. from ._conv import
register_converters as _register_converters Traceback (most recent
call last): File
"/tmp/cohort_determine_ploidy_and_depth.1941148667013278511.py", line
79, in <module> args.contig_ploidy_prior_table) File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py",
line 182, in get_contig_ploidy_prior_map_from_tsv_file
delimiter=delimiter) File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py",
line 50, in read_csv input_pd = pd.read_csv(fh, delimiter=delimiter,
dtype=dtypes_dict) # dtypes_dict keys may not be present File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py",
line 705, in parser_f return _read(filepath_or_buffer, kwds) File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py",
line 451, in _read data = parser.read(nrows) File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py",
line 1065, in read ret = self._engine.read(nrows) File
"/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py",
line 1828, in read data = self._reader.read(nrows) File
"pandas/_libs/parsers.pyx", line 894, in
pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx",
line 916, in pandas._libs.parsers.TextReader._read_low_memory File
"pandas/_libs/parsers.pyx", line 970, in
pandas._libs.parsers.TextReader._read_rows File
"pandas/_libs/parsers.pyx", line 957, in
pandas._libs.parsers.TextReader._tokenize_rows File
"pandas/_libs/parsers.pyx", line 2200, in
pandas._libs.parsers.raise_parser_error pandas.errors.ParserError:
Error tokenizing data. C error: Expected 5 fields in line 58, saw 7
16:54:55.812 DEBUG ScriptExecutor - Result: 1 16:54:55.813 INFO
DetermineGermlineContigPloidy - Shutting down engine [February 3, 2020
4:54:55 PM IRST]
org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy
done. Elapsed time: 0.78 minutes. Runtime.totalMemory()=3370123264
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1 Command Line: python
So, it seems that the error is;
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 58, saw 7
I googled a lot but I could not figure out what the problem is ( I have no experience working with python, I am just following the steps in here; https://gatkforums.broadinstitute.org/gatk/discussion/11684
Can anyone help me to solve the issue?
Thanks in advance,
Zohreh