Hi Biostar,
I am executing "Local Re-align" step of CALL and got the below exception:
ERROR MESSAGE on executing "local realign" step of CALL for the input BAM file: Error caching SAM record HSQ1009_86:8:2203:8087:125077, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present. Please help me to figure out reason for the duplicate values in the bam file even after doing "MarkDuplicates" on the input file bam file.
Execution Steps:
=============
Input file for CALL stage: ars200000001.sorted.markDup.bam
Executed the first stage of CALL ie, "Create Realigner region". Output file ars200000001.forRealigner.intervals is created with the positions which needs re-alignment. Command: 'java -Xmx8g -jar /knome-pd/fbaby/isys/ksuite/software/kgap-core/thirdparty/gatk/GenomeAnalysisTK.jar -T RealignerTargetCreator -l INFO -I /knome-pd/fbaby/isys/ksuite/seqdata/ars/200000001/ars200000001.markdup.sorted.bam -R /knome-pd/fbaby/isys/ksuite/refdata/refgenome/hg19/human_hg19_full.fasta -o /kfs/scratch/fbaby-piid/realigner/200000001.chr{}.forRealigner.intervals -known /knome-pd/fbaby/isys/ksuite/refdata/refgenome/dbsnp/00-All.vcf
Executed "Local Re-align" step: java -Xmx8g -jar /knome-pd/fbaby/isys/ksuite/software/kgap-core/thirdparty/gatk/GenomeAnalysisTK.jar -T IndelRealigner -l INFO -I /knome-pd/fbaby/isys/ksuite/seqdata/ars/200000001/ars200000001.sorted.markDup.bam -R /knome-pd/fbaby/isys/ksuite/refdata/refgenome/hg19/human_hg19_full.fasta --maxReadsInMemory 1500000 -targetIntervals /kfs/scratch/fbaby-piid/realigner/ars200000001.forRealigner.intervals -o /kfs/scratch/fbaby-piid/cleaned/ars200000001.sorted.markDup.cleaned.bam -known /knome-pd/fbaby/isys/ksuite/refdata/refgenome/dbsnp/00-All.vcf Got below shown error while executing:
2014/03/17 20:43:50 [pid=2471 tid=15(CmdUtils-stderr)] [ERROR] (CmdUtils$LoggerOutputHandler.java:103) ##### ERROR MESSAGE: Error caching SAM record HSQ1009_86:8:2203:8087:125077, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
Verified the bam file and got the below duplicate values in the bam file:
samtools view ars200000001.sorted.markDup.bam | grep "HSQ1009_86:8:2203:8087:125077"
HSQ1009_86:8:2203:8087:125077 147 8 108630732 1 38M62S = 108630733 -37 ACATATATACATGTGTATATAAATATATACATATATAGATGTGTATATAAATATATACATATATACATGTGTATATAAATATATACATATATACATGTGT EIIGJIIECIHGIHFDIIIGGJJIIGHGGJIIIEHGD<igjjjjjhihiijjiiihgjjiiiighijjijjiiiijjjjiiijihjjhhhhhfffffccc sa:z:8,108630733,-,57s36m7s,1,0;8,108630733,-="" ,29s36m35s,1,1;="" pg:z:markduplicates="" rg:z:na12878_30x_pairend="" nm:i:1="" as:i:37="" xs:i:36<="" p="">
HSQ1009_86:8:2203:8087:125077 99 8 108630733 0 29S36M35S = 108630732 37 ACATGTATACATGTGTATATAAATATATACATATATACATGTGTATATAAATATATACATATATACATGTGTATATAAATATATACATATATACATGTGT @B@FFEFFDCDFHFHHIJJJJJJIJJJJJJJJJJJGDEHIGHGFIIIJJJJJJJJJJJJJJJJJDFFGHGJ@FIIIIIIJJIJJJJIJJJJIGGFGHIJC SA:Z:8,108630733,+,57S36M7S,0,0;8,108630737,+,5S32M63S,0,0; PG:Z:MarkDuplicates RG:Z:NA12878_30x_pairend NM:i:0 AS:i:36 XS:i:36
HSQ1009_86:8:2203:8087:125077 2147 8 108630733 0 57H36M7H = 108630732 37 CATATATACATGTGTATATAAATATATACATATATA JJJJJJJDFFGHGJ@FIIIIIIJJIJJJJIJJJJIG SA:Z:8,108630733,+,29S36M35S,0,0;8,108630737,+,5S32M63S,0,0; PG:Z:MarkDuplicates RG:Z:NA12878_30x_pairend NM:i:0 AS:i:36 XS:i:29
HSQ1009_86:8:2203:8087:125077 2195 8 108630733 1 57H36M7H = 108630733 -36 CATATATACATGTGTATATAAATATATACATATATA JJIIIIGHIJJIJJIIIIJJJJIIIJIHJJHHHHHF SA:Z:8,108630732,-,38M62S,1,1;8,108630733,-,29S36M35S,1,1; PG:Z:MarkDuplicates RG:Z:NA12878_30x_pairend NM:i:0 AS:i:36 XS:i:29
HSQ1009_86:8:2203:8087:125077 2195 8 108630733 1 29H36M35H = 108630733 -36 CATATATAGATGTGTATATAAATATATACATATATA JIIIEHGD<igjjjjjhihiijjiiihgjjiiiigh sa:z:8,108630732,-,38m62s,1,1;8,108630733,-,57s36m7s,1,0;="" pg:z:markduplicates="" rg:z:na12878_30x_pairend="" nm:i:1="" as:i:31="" xs:i:28<="" p="">
HSQ1009_86:8:2203:8087:125077 2147 8 108630737 0 5H32M63H = 108630732 33 TATACATGTGTATATAAATATATACATATATA EFFDCDFHFHHIJJJJJJIJJJJJJJJJJJGD SA:Z:8,108630733,+,29S36M35S,0,0;8,108630733,+,57S36M7S,0,0; PG:Z:MarkDuplicates RG:Z:NA12878_30x_pairend NM:i:0 AS:i:32 XS:i:28
Also used ValidateSamFile.jar to validate the bam file and got the below shown exception:
java -Xmx48g -jar /knome-pd/fbaby/isys/ksuite/software/kgap-core/thirdparty/picard/ValidateSamFile.jar INPUT=ars200000001.sorted.markDup.bam OUTPUT=ars200000001.sorted.validation.txt IGNORE=INVALID_FLAG_PROPER_PAIR IGNORE=INVALID_FLAG_FIRST_OF_PAIR IGNORE=INVALID_FLAG_SECOND_OF_PAIR IGNORE=INVALID_FLAG_READ_UNMAPPED IGNORE=INVALID_MAPPING_QUALITY IGNORE=INVALID_CIGAR IGNORE=INVALID_ALIGNMENT_START REFERENCE_SEQUENCE=/kfs/aligncall/refdata/human_hg19_full.fasta IGNORE_WARNINGS=True MODE=SUMMARY [Tue Mar 18 06:43:56 EDT 2014] net.sf.picard.sam.ValidateSamFile INPUT=ars200000001.sorted.markDup.bam OUTPUT=ars200000001.sorted.validation.txt MODE=SUMMARY IGNORE=[INVALID_FLAG_PROPER_PAIR, INVALID_FLAG_FIRST_OF_PAIR, INVALID_FLAG_SECOND_OF_PAIR, INVALID_FLAG_READ_UNMAPPED, INVALID_MAPPING_QUALITY, INVALID_CIGAR, INVALID_ALIGNMENT_START] REFERENCE_SEQUENCE=/kfs/aligncall/refdata/human_hg19_full.fasta IGNORE_WARNINGS=true MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Tue Mar 18 06:43:56 EDT 2014] Executing as root@kdev02.knome.com on Linux 2.6.32-279.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_12_10_15_14-b00 INFO 2014-03-18 06:49:54 SamFileValidator 10000000 reads validated. INFO 2014-03-18 06:54:24 SamFileValidator 20000000 reads validated. INFO 2014-03-18 06:58:43 SamFileValidator 30000000 reads validated. INFO 2014-03-18 07:04:12 SamFileValidator 40000000 reads validated. INFO 2014-03-18 07:08:55 SamFileValidator 50000000 reads validated. INFO 2014-03-18 07:13:11 SamFileValidator 60000000 reads validated. [Tue Mar 18 07:17:34 EDT 2014] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 33.65 minutes. Runtime.totalMemory()=962068480 Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: HSQ1009_86:6:1204:6504:189329 at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124) at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78) at net.sf.picard.sam.SamFileValidator$CoordinateSortedPairEndInfoMap.remove(SamFileValidator.java:612) at net.sf.picard.sam.SamFileValidator.validateMateFields(SamFileValidator.java:400) at net.sf.picard.sam.SamFileValidator.validateSamRecords(SamFileValidator.java:243) at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:176) at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:103) at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:153) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175) at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:92)
Executed "MarkDuplicates" for the input bam file ars200000001.sorted.markDup.bam and got the output file: ars200000001.sorted.aftermarkDup.bam
Command: java -Xmx4g -Djava.io.tmpdir=tmp -jar /kfs/aligncall/picard-tools-1.98/MarkDuplicates.jar INPUT=ars200000001.sorted.markDup.bam TMP_DIR=tmp OUTPUT=ars200000001.sorted.aftermarkDup.bam METRICS_FILE=ars200000001.DuplicateMetrics.txt COMPRESSION_LEVEL=1 VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO CREATE_INDEX=true
INFO 2014-03-19 04:09:02 MarkDuplicates Written 730,000,000 records. Elapsed time: 05:24:27s. Time for last 10,000,000: 268s. Last read position: 17:72,782,111 INFO 2014-03-19 04:13:30 MarkDuplicates Written 740,000,000 records. Elapsed time: 05:28:55s. Time for last 10,000,000: 267s. Last read position: 18:26,345,729 INFO 2014-03-19 04:17:55 MarkDuplicates Written 750,000,000 records. Elapsed time: 05:33:20s. Time for last 10,000,000: 264s. Last read position: 18:61,126,873 INFO 2014-03-19 04:22:27 MarkDuplicates Written 760,000,000 records. Elapsed time: 05:37:52s. Time for last 10,000,000: 272s. Last read position: 19:19,279,802 INFO 2014-03-19 04:27:14 MarkDuplicates Written 770,000,000 records. Elapsed time: 05:42:39s. Time for last 10,000,000: 286s. Last read position: 19:54,065,077 INFO 2014-03-19 04:31:54 MarkDuplicates Written 780,000,000 records. Elapsed time: 05:47:19s. Time for last 10,000,000: 279s. Last read position: 20:32,354,929 INFO 2014-03-19 04:36:20 MarkDuplicates Written 790,000,000 records. Elapsed time: 05:51:46s. Time for last 10,000,000: 266s. Last read position: 21:10,975,875 INFO 2014-03-19 04:40:53 MarkDuplicates Written 800,000,000 records. Elapsed time: 05:56:18s. Time for last 10,000,000: 272s. Last read position: 22:16,091,944 INFO 2014-03-19 04:45:33 MarkDuplicates Written 810,000,000 records. Elapsed time: 06:00:58s. Time for last 10,000,000: 280s. Last read position: X:1,598,647 INFO 2014-03-19 04:50:08 MarkDuplicates Written 820,000,000 records. Elapsed time: 06:05:33s. Time for last 10,000,000: 274s. Last read position: X:37,449,263 INFO 2014-03-19 04:54:36 MarkDuplicates Written 830,000,000 records. Elapsed time: 06:10:01s. Time for last 10,000,000: 268s. Last read position: X:71,876,186 INFO 2014-03-19 04:59:00 MarkDuplicates Written 840,000,000 records. Elapsed time: 06:14:25s. Time for last 10,000,000: 263s. Last read position: X:107,521,203 INFO 2014-03-19 05:03:15 MarkDuplicates Written 850,000,000 records. Elapsed time: 06:18:40s. Time for last 10,000,000: 254s. Last read position: X:142,968,884 INFO 2014-03-19 05:07:55 MarkDuplicates Written 860,000,000 records. Elapsed time: 06:23:20s. Time for last 10,000,000: 280s. Last read position: 7_gl000195_random:34,336 INFO 2014-03-19 05:12:20 MarkDuplicates Written 870,000,000 records. Elapsed time: 06:27:45s. Time for last 10,000,000: 264s. Last read position: / INFO 2014-03-19 05:13:49 MarkDuplicates Before output close freeMemory: 3092103112; totalMemory: 3100639232; maxMemory: 3817865216 INFO 2014-03-19 05:13:50 MarkDuplicates After output close freeMemory: 3092116416; totalMemory: 3100639232; maxMemory: 3817865216 [Wed Mar 19 05:13:50 EDT 2014] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 664.43 minutes. Runtime.totalMemory()=3100639232
Validated bam file ars200000001.sorted.aftermarkDup.bam created after MarkDuplicates:
Command: java -Xmx48g -jar /knome-pd/fbaby/isys/ksuite/software/kgap-core/thirdparty/picard/ValidateSamFile.jar INPUT=ars200000001.sorted.aftermarkDup.bam OUTPUT=ars200000001.sorted.validation.txt IGNORE=INVALID_FLAG_PROPER_PAIR IGNORE=INVALID_FLAG_FIRST_OF_PAIR IGNORE=INVALID_FLAG_SECOND_OF_PAIR IGNORE=INVALID_FLAG_READ_UNMAPPED IGNORE=INVALID_MAPPING_QUALITY IGNORE=INVALID_CIGAR IGNORE=INVALID_ALIGNMENT_START REFERENCE_SEQUENCE=/knome-pd/fbaby/isys/ksuite/refdata/refgenome/hg19/human_hg19_full.fasta IGNORE_WARNINGS=True MODE=SUMMARY [Wed Mar 19 05:28:39 EDT 2014] net.sf.picard.sam.ValidateSamFile INPUT=ars200000001.sorted.aftermarkDup.bam OUTPUT=ars200000001.sorted.validation.txt MODE=SUMMARY IGNORE=[INVALID_FLAG_PROPER_PAIR, INVALID_FLAG_FIRST_OF_PAIR, INVALID_FLAG_SECOND_OF_PAIR, INVALID_FLAG_READ_UNMAPPED, INVALID_MAPPING_QUALITY, INVALID_CIGAR, INVALID_ALIGNMENT_START] REFERENCE_SEQUENCE=/knome-pd/fbaby/isys/ksuite/refdata/refgenome/hg19/human_hg19_full.fasta IGNORE_WARNINGS=true MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Mar 19 05:28:39 EDT 2014] Executing as root@kdev02.knome.com on Linux 2.6.32-279.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_12_10_15_14-b00 INFO 2014-03-19 05:31:25 SamFileValidator 10000000 reads validated. INFO 2014-03-19 05:34:15 SamFileValidator 20000000 reads validated. INFO 2014-03-19 05:36:57 SamFileValidator 30000000 reads validated. INFO 2014-03-19 05:40:14 SamFileValidator 40000000 reads validated. INFO 2014-03-19 05:42:58 SamFileValidator 50000000 reads validated. INFO 2014-03-19 05:45:28 SamFileValidator 60000000 reads validated. [Wed Mar 19 05:48:03 EDT 2014] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 19.41 minutes. Runtime.totalMemory()=3423600640 Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: HSQ1009_86:6:1204:6504:189329 at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124) at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78) at net.sf.picard.sam.SamFileValidator$CoordinateSortedPairEndInfoMap.remove(SamFileValidator.java:612) at net.sf.picard.sam.SamFileValidator.validateMateFields(SamFileValidator.java:400) at net.sf.picard.sam.SamFileValidator.validateSamRecords(SamFileValidator.java:243) at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:176) at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:103) at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:153) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175) at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:92)
Expected Result:
===============
- Error free run of "Local Re-align" step which outputs ars200000001.sorted.markDup.cleaned.bam
Thank you... Philipp.
I tried executing local realign step using
REMOVE_DUPLICATES=true
as show below:Command:
After which I validated the bam file and got the same error as before:
Have a look at these threads, there are several solutions:
MarkDuplicates: Value was put into PairInfoMap more than once
http://seqanswers.com/forums/showthread.php?t=17772