Question

Picard Mark Duplicate Reads In Galaxy

1

Entering edit mode

10.6 years ago

Tonyzeng ▴ 310

HI, I want to use picard mark duplicate reads to just mark my read with potential duplication. from the manual of picard, we need to sort and then index BAM files before we do mark duplicate if I operate it under Linux. HOwever, I want to just try markduplication of Picard under galaxy first, then I found that there is no sort/index BAM function in Galaxy but just mark duplicate read function, Is that mean Galaxy mark duplicate under Picard toolkit has also considered sort/index BAM in it?

picard markduplicates galaxy • 6.1k views

ADD COMMENT • link updated 10.6 years ago by boris ▴ 10 • written 10.6 years ago by Tonyzeng ▴ 310

score 1 · Answer 1 · 2013-10-09

1

Entering edit mode

10.6 years ago

boris ▴ 10

I understand that Galaxy automatically sorts by coordinate the BAM files you upload. Also, if a Galaxy tool outputs a BAM file, the implementation will output a sorted file. So, you do not need to sort before running the MarkDuplicates tool within Galaxy. The file is already sorted.

ADD COMMENT • link 10.6 years ago by boris ▴ 10

score 0 · Answer 2 · 2013-10-04

0

Entering edit mode

10.6 years ago

Pierre Lindenbaum 161k

I don't know how picard Mark Duplicate is implemented in galaxy, but the command-line program indexes the Bam by default at the end.

http://picard.sourceforge.net/command-line-overview.shtml#Overview

The following options are relevant for most Picard programs:

CREATE_INDEX=Boolean

ADD COMMENT • link 10.6 years ago by Pierre Lindenbaum 161k

score 0 · Answer 3 · 2013-10-05

I don't think it does, and in fact Picard tools expect BAM files to be at least indexed. in case your BAM files aren't sorted you can use the "Assume reads are already ordered" option as FALSE, but in case they aren't indexed I don't see in Galaxy the obvious option that is to index them through a simple "samtools index bamfile". the only tool available I see would be GATK PrintReads, that would allow you to filter any type of reads before deduping if desired like low mapping qualities or any malformed reads (we indeed use it as an initial step in our GATK pipeline to prepare our BAM files with the --read_filter MappingQualityZero --filter_mismatching_base_and_quals options), but it also will generate an index file for your BAM file. then you should be able to feed Picard's MarkDuplicates with GATK's PrintReads' output.