How do I run ConTest for sample contamination test?
0
0
Entering edit mode
8.0 years ago
MAPK ★ 2.1k

I am trying to run this tool called ContEst ( https://www.broadinstitute.org/cancer/cga/contest_run ) to identify sample contamination in a cohort of 200 samples. I have multigenome vcf files split per chromosome, so there are 24 files. I was not able to run the example data. I am not sure how I can input all those vcf files and bam files for my 200 samples. Also, How do I create this /hg19_population_stratified_af_hapmap_3.3.vcf file they have provided in their test data? Can someone please advise on this?

This is the command I used for test data:

 java -Xmx2400m -jar ../contest-1.0.24530-bin/ContEst.jar -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt

Generates this error:

INFO  16:34:06,880 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,882 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.0-6228-gdf95121, Compiled 2011/07/14 11:09:43 
INFO  16:34:06,882 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  16:34:06,882 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki 
INFO  16:34:06,882 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  16:34:06,883 HelpFormatter - Program Args: -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt  
INFO  16:34:06,883 HelpFormatter - Date/Time: 2016/04/28 16:34:06 
INFO  16:34:06,883 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,883 HelpFormatter - ----------------------------------------------------------------------------------- 
INFO  16:34:06,889 GenomeAnalysisEngine - Strictness is SILENT 
WARN  16:34:08,503 RestStorageService - Error Response: PUT '/GATK_Run_Reports/17720jPUXZENGmL5xxhVm8ktG7Wauzrh.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1546, Content-MD5: fwxUz6XYZcATFzjt7dnhiw==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 7f0c54cfa5d865c0131738ededd9e18b, Date: Thu, 28 Apr 2016 06:34:07 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:+lhsRlwcDuhvJHiD8t0aUXAJ1GY=, User-Agent: JetS3t/0.8.0 (Linux/3.13.0-71-generic; amd64; en; JVM 1.7.0_76), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 92AAE1F75FB98501, x-amz-id-2: obrad2EGQHlvJ6kwawpw0l9hJWToGIPxeUuWA3PXAi3T+i3+mdcFiMae6z0F0oqC, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 28 Apr 2016 06:34:07 GMT, Connection: close, Server: AmazonS3] 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.0-6228-gdf95121): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to load reference dictionary
##### ERROR ------------------------------------------------------------------------------------------
contEst NGS • 2.8k views
ADD COMMENT
0
Entering edit mode

So the MESSAGE is showing there is no sequence dictionary available for your reference genome. Create on with picard tools CreateSequenceDictionary

ADD REPLY
0
Entering edit mode

Thanks, but that is where I have error while creating .dict file: [Fri Apr 29 ] picard.sam.CreateSequenceDictionary REFERENCE=hg19.fa OUTPUT=hg19.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false OpenJDK 64-Bit Server VM warning: You have loaded library disabled stack guard. The VM will try to fix the stack guard now.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6