HI, I want to use picard mark duplicate reads to just mark my read with potential duplication. from the manual of picard, we need to sort and then index BAM files before we do mark duplicate if I operate it under Linux. HOwever, I want to just try markduplication of Picard under galaxy first, then I found that there is no sort/index BAM function in Galaxy but just mark duplicate read function, Is that mean Galaxy mark duplicate under Picard toolkit has also considered sort/index BAM in it?
I understand that Galaxy automatically sorts by coordinate the BAM files you upload. Also, if a Galaxy tool outputs a BAM file, the implementation will output a sorted file. So, you do not need to sort before running the MarkDuplicates tool within Galaxy. The file is already sorted.
I don't know how picard Mark Duplicate is implemented in galaxy, but the command-line program indexes the Bam by default at the end.
The following options are relevant for most Picard programs: CREATE_INDEX=Boolean
I don't think it does, and in fact Picard tools expect BAM files to be at least indexed. in case your BAM files aren't sorted you can use the "Assume reads are already ordered" option as FALSE, but in case they aren't indexed I don't see in Galaxy the obvious option that is to index them through a simple "samtools index bamfile". the only tool available I see would be GATK PrintReads, that would allow you to filter any type of reads before deduping if desired like low mapping qualities or any malformed reads (we indeed use it as an initial step in our GATK pipeline to prepare our BAM files with the --read_filter MappingQualityZero --filter_mismatching_base_and_quals options), but it also will generate an index file for your BAM file. then you should be able to feed Picard's MarkDuplicates with GATK's PrintReads' output.