Question: How do I run ConTest for sample contamination test?
0
gravatar for MAPK
2.6 years ago by
MAPK1.3k
United States
MAPK1.3k wrote:

I am trying to run this tool called ContEst ( https://www.broadinstitute.org/cancer/cga/contest_run ) to identify sample contamination in a cohort of 200 samples. I have multigenome vcf files split per chromosome, so there are 24 files. I was not able to run the example data. I am not sure how I can input all those vcf files and bam files for my 200 samples. Also, How do I create this /hg19_population_stratified_af_hapmap_3.3.vcf file they have provided in their test data? Can someone please advise on this?

This is the command I used for test data:

 java -Xmx2400m -jar ../contest-1.0.24530-bin/ContEst.jar -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt

Generates this error:

INFO  16:34:06,880 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,882 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.0-6228-gdf95121, Compiled 2011/07/14 11:09:43 
INFO  16:34:06,882 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  16:34:06,882 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki 
INFO  16:34:06,882 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  16:34:06,883 HelpFormatter - Program Args: -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt  
INFO  16:34:06,883 HelpFormatter - Date/Time: 2016/04/28 16:34:06 
INFO  16:34:06,883 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,883 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,889 GenomeAnalysisEngine - Strictness is SILENT 
WARN  16:34:08,503 RestStorageService - Error Response: PUT '/GATK_Run_Reports/17720jPUXZENGmL5xxhVm8ktG7Wauzrh.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1546, Content-MD5: fwxUz6XYZcATFzjt7dnhiw==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 7f0c54cfa5d865c0131738ededd9e18b, Date: Thu, 28 Apr 2016 06:34:07 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:+lhsRlwcDuhvJHiD8t0aUXAJ1GY=, User-Agent: JetS3t/0.8.0 (Linux/3.13.0-71-generic; amd64; en; JVM 1.7.0_76), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 92AAE1F75FB98501, x-amz-id-2: obrad2EGQHlvJ6kwawpw0l9hJWToGIPxeUuWA3PXAi3T+i3+mdcFiMae6z0F0oqC, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 28 Apr 2016 06:34:07 GMT, Connection: close, Server: AmazonS3] 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.0-6228-gdf95121): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to load reference dictionary
##### ERROR ------------------------------------------------------------------------------------------
contest ngs • 1.5k views
ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 2.6 years ago by MAPK1.3k

So the MESSAGE is showing there is no sequence dictionary available for your reference genome. Create on with picard tools CreateSequenceDictionary

ADD REPLYlink written 2.6 years ago by venu5.7k

Thanks, but that is where I have error while creating .dict file: [Fri Apr 29 ] picard.sam.CreateSequenceDictionary REFERENCE=hg19.fa OUTPUT=hg19.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false OpenJDK 64-Bit Server VM warning: You have loaded library disabled stack guard. The VM will try to fix the stack guard now.

ADD REPLYlink written 2.6 years ago by MAPK1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1425 users visited in the last hour