Question: Picard Mark Duplicate Reads In Galaxy
gravatar for Tonyzeng
5.5 years ago by
Tonyzeng300 wrote:

HI, I want to use picard mark duplicate reads to just mark my read with potential duplication. from the manual of picard, we need to sort and then index BAM files before we do mark duplicate if I operate it under Linux. HOwever, I want to just try markduplication of Picard under galaxy first, then I found that there is no sort/index BAM function in Galaxy but just mark duplicate read function, Is that mean Galaxy mark duplicate under Picard toolkit has also considered sort/index BAM in it?

picard galaxy markduplicates • 4.0k views
ADD COMMENTlink modified 5.5 years ago by boris10 • written 5.5 years ago by Tonyzeng300
gravatar for boris
5.5 years ago by
State College, PA USA
boris10 wrote:

I understand that Galaxy automatically sorts by coordinate the BAM files you upload. Also, if a Galaxy tool outputs a BAM file, the implementation will output a sorted file. So, you do not need to sort before running the MarkDuplicates tool within Galaxy. The file is already sorted.

ADD COMMENTlink written 5.5 years ago by boris10
gravatar for Pierre Lindenbaum
5.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

I don't know how picard Mark Duplicate is implemented in galaxy, but the command-line program indexes the Bam by default at the end.

The following options are relevant for most Picard programs:

ADD COMMENTlink written 5.5 years ago by Pierre Lindenbaum119k
gravatar for Jorge Amigo
5.5 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

I don't think it does, and in fact Picard tools expect BAM files to be at least indexed. in case your BAM files aren't sorted you can use the "Assume reads are already ordered" option as FALSE, but in case they aren't indexed I don't see in Galaxy the obvious option that is to index them through a simple "samtools index bamfile". the only tool available I see would be GATK PrintReads, that would allow you to filter any type of reads before deduping if desired like low mapping qualities or any malformed reads (we indeed use it as an initial step in our GATK pipeline to prepare our BAM files with the --read_filter MappingQualityZero --filter_mismatching_base_and_quals options), but it also will generate an index file for your BAM file. then you should be able to feed Picard's MarkDuplicates with GATK's PrintReads' output.

ADD COMMENTlink written 5.5 years ago by Jorge Amigo11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 771 users visited in the last hour