Entering edit mode
5.1 years ago
zizigolu
★
4.3k
Hi,
Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed, I tried picard for removing duplicated but I am getting error
Broadinstitute says You have to be around for a little while longer before you can post links.
so I can not post my question there
[fi1d18@cyan02 fi1d18]$ picard MarkDuplicates I=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam O=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam M= marked-dup-metrics.txt [Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates INPUT=[/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam] OUTPUT=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam METRICS_FILE=marked-dup-metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Mar 07 17:33:42 GMT 2019] Executing as fi1d18@cyan02 on Linux 2.6.32-754.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16; Picard version: 2.8.3-SNAPSHOT
INFO 2019-03-07 17:33:42 MarkDuplicates Start of doWork freeMemory: 2012347496; totalMemory: 2027945984; maxMemory: 3817865216
INFO 2019-03-07 17:33:42 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-03-07 17:33:42 MarkDuplicates Will retain up to 14684096 data points before spilling to disk.
WARNING: BAM index file /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam.bai is older than BAM /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam
[Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2027945984
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 3752, Read name HX3_22030:3:2114:23155:23319, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:665)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:438)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:222)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
[fi1d18@cyan02 fi1d18]$
How I could know the duplicates already removed and I am trying non sense because I don't know what this error says at all
File /home/local/software/picard-tools/2.8.3/reference.dict not found
But
reference.dict
supposed to be my output by this command :(You are using the jar from
/local/software/picard-tools/2.8.3/jarlib/picard.jar
... does/home/local/software/picard-tools/2.8.3/
exist?Yes it does however this was an intermediate step for using GATK
Error message is very explicit about what is wrong.
Please pick a more descriptive title for your question(s)!
Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed