Question: Issues with Marking Duplicates in Picard
0
gravatar for sarah.kettelkamp
11 weeks ago by
sarah.kettelkamp0 wrote:

Hi everyone!

So I have been tasked with analyzing some sequence data even though I have no clue what I'm doing. I was given some data from 11 samples (S1, S2, etc.), each with a singles file as well as a forward read file (R1) and a reverse read file (R2). In addition, each sample was run in two different lanes (L001 and L002), so for every file there is a corresponding file from the other lane. I was given this data after it had already had some quality control done using Scythe and Sickle. They were in fastq format.

So my first step was to map these files to the reference. I did this using BWA mem. I aligned both the R1 and R2 files of a given sample and lane to the reference, then did it for the R1 and R2 of the same sample different lane, then did it for the singles files for each lane. Therefore, for every sample, I got 4 sam files that were mapped to the reference (ex - S1 L001 R1&R2, S1 L002 R1&R2, S1 L001 single, S1 L002 single) for all 11 samples.

Next I used samtools to convert the sam files to bam, as well as to constrict the bam file to only what was mapped to the reference genome.

Now here is where the trouble begins - I next used samtools to merge all of a sample's files together, for instance anything from S1 including both lanes for both the singles and the merged R1&R2 files. I used

samtools merge S1_merged.bam singleS1_L001.bam singleS1_L002.bam S1_L001.bam S1_L002.bam.

Then I tried to use MarkDuplicatesWithMateCigar in Picard to mark the duplicates in the single merged file (S1_merged.bam). But when I did it gave me the error "this program requires inputs in coordinate SortOrder." It seems as though my headings weren't sorted correctly.

So I tried to sort the merged bam file using samtools sort. I did

samtools sort S1_merged.bam -o S1_sorted.bam

which gave me a ton of files. I tried redoing it using the "-m 20G" command and it gave me 6 files instead.

So then I merged these six sorted files into "S1_sorted.bam" using samtools merge and tried doing MarkDuplicatesWithMakeCigar again. I did

java -jar $PICARD MarkDuplicatesWithMateCigar I=S1_sorted.bam O=S1_marked.bam M=S1_marked_metrics.txt

And it told me "Exception in thread "main" picard.PicardException: Found a samRecordWithOrdinal with sufficiently large clipping that we may have missed including it in an early duplicate marking iteration. Please increase the minimum distance to at least 120bp." So I tried to do it again but with the command "MINIMUM_DISTANCE=120" command added and it didn't even give me an error, it just spit me back out a list of a bunch of commands. I tried using MarkDuplicates instead of MarkDuplicatesWithMateCigar and it did the same thing.

I'm really at a loss here guys. Should I have sorted before I merged all the lanes and singles? Should I have merged my sorted files after sorting? Am I missing something?

Any help would be greatly appreciated.

bwa sorting samtools picard gatk • 230 views
ADD COMMENTlink modified 11 weeks ago by goodez460 • written 11 weeks ago by sarah.kettelkamp0

it just spit me back out a list of a bunch of commands

which ones ?

ADD REPLYlink written 11 weeks ago by Pierre Lindenbaum117k

![I linked a screenshot of my terminal showing what it gives me][1]

https://imgur.com/pUmF8lh

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by sarah.kettelkamp0

there is a problem with your command line. A parameter is missing or wrong.

ADD REPLYlink written 11 weeks ago by Pierre Lindenbaum117k
2
gravatar for goodez
11 weeks ago by
goodez460
United States
goodez460 wrote:

First, I don't totally understand what the "singles" file is. I would just stick with the forward and reverse fastq files (R1 and R2).

Now for combining samples from multiple lanes... I usually merge the fastq files before aligning. I first check their quality using FastQC. It is also okay to combine after alignment as you have.

So I tried to sort the merged bam file using samtools sort. . . which gave me a ton of files. I tried redoing it using the "-m 20G" command and it gave me 6 files instead.

This is the most troubling part. Samtools sort should have output one sorted bam, not multiple files. These may have been intermediate files, did you let the program finish running completely? Also, the manual states to run samtools sort this way:

samtools sort -o out.bam in.bam

You did this in the wrong order (I don't know if that actually affects how it runs).

Perhaps that will fix your issues.

ADD COMMENTlink written 11 weeks ago by goodez460

Oh dang, okay. Let me trying doing the sort the way you listed. Hopefully that helps. Thanks!

ADD REPLYlink written 11 weeks ago by sarah.kettelkamp0

I tried doing

samtools sort -o S1_sorted.bam S1_merged.bam

and it told me "fail to open file S1_sorted.bam"

ADD REPLYlink written 11 weeks ago by sarah.kettelkamp0

Weird. Maybe because that output file already exists, and doesn't want to overwrite it?

Try this as well. You shouldn't have to specify bam format, but I don't know what version of software you're using.

samtools sort -O bam -o S1_sorted.bam S1_merged.bam

Run exactly that. It shouldn't give errors like that.

ADD REPLYlink written 11 weeks ago by goodez460

So while I was waiting for your response, I tried doing

samtools sort -@ 4 -m 30G S1_merged.bam S1_sorted.bam

And it worked and only gave me one file!

But then I tried doing the MarkDuplicates and it still did the same thing as before. I must be doing something wrong at this step.

ADD REPLYlink written 11 weeks ago by sarah.kettelkamp0
1

Sorry to hear that. I personally have had issues every time I've tried to use any Picard tool... Do you require duplicate removal in your analysis? Removing duplicates is often unnecessary and can even falsely remove unique reads.

ADD REPLYlink written 11 weeks ago by goodez460

Which version of samtools are you using? Getting this error message for the command given, makes me think it will be 0.1.19. Because there the -o parameter and the positional arguments has another meaning then nowadays.

If you really use this very, very old version please upgrade first before continue.

fin swimmer

ADD REPLYlink written 11 weeks ago by finswimmer10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 819 users visited in the last hour