Question: Removing or not removing the duplicates in .bam file
0
gravatar for A
9 months ago by
A3.6k
A3.6k wrote:

Hi,

Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed, I tried picard for removing duplicated but I am getting error

Broadinstitute says You have to be around for a little while longer before you can post links. so I can not post my question there

[fi1d18@cyan02 fi1d18]$ picard MarkDuplicates I=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam O=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam M= marked-dup-metrics.txt [Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates INPUT=[/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam] OUTPUT=/temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked1.bam METRICS_FILE=marked-dup-metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Mar 07 17:33:42 GMT 2019] Executing as fi1d18@cyan02 on Linux 2.6.32-754.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16; Picard version: 2.8.3-SNAPSHOT
INFO 2019-03-07 17:33:42 MarkDuplicates Start of doWork freeMemory: 2012347496; totalMemory: 2027945984; maxMemory: 3817865216
INFO 2019-03-07 17:33:42 MarkDuplicates Reading input file and constructing read end information.
INFO 2019-03-07 17:33:42 MarkDuplicates Will retain up to 14684096 data points before spilling to disk.
WARNING: BAM index file /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam.bai is older than BAM /temp/hgig/fi1d18/1631_WTSI-OESO_005_a_DNA/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-OESO_005_a_DNA.dupmarked.bam
[Thu Mar 07 17:33:42 GMT 2019] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2027945984
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 3752, Read name HX3_22030:3:2114:23155:23319, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:665)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:438)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:222)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
[fi1d18@cyan02 fi1d18]$

How I could know the duplicates already removed and I am trying non sense because I don't know what this error says at all

wgs rna-seq picard gatk • 485 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by A3.6k

File /home/local/software/picard-tools/2.8.3/reference.dict not found

ADD REPLYlink written 9 months ago by WouterDeCoster42k

But reference.dict supposed to be my output by this command :(

ADD REPLYlink written 9 months ago by A3.6k

You are using the jar from /local/software/picard-tools/2.8.3/jarlib/picard.jar... does /home/local/software/picard-tools/2.8.3/ exist?

ADD REPLYlink written 9 months ago by WouterDeCoster42k

Yes it does however this was an intermediate step for using GATK

ADD REPLYlink modified 9 months ago • written 9 months ago by A3.6k

Error message is very explicit about what is wrong.

ADD REPLYlink written 9 months ago by WouterDeCoster42k

Please pick a more descriptive title for your question(s)!

ADD REPLYlink written 9 months ago by WouterDeCoster42k

Sorry I have a list of .bam files from WGS, maintainer says that the duplicates been marked but not removed

ADD REPLYlink modified 9 months ago • written 9 months ago by A3.6k
0
gravatar for Asaf
9 months ago by
Asaf6.5k
Israel
Asaf6.5k wrote:

See here: https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

ADD COMMENTlink written 9 months ago by Asaf6.5k
0
gravatar for A
9 months ago by
A3.6k
A3.6k wrote:

The problem was I was using O while I must used OUTPUT :(

ADD COMMENTlink written 9 months ago by A3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1681 users visited in the last hour