Mark Duplicates from Picard does not seem to recognize my sorted BAM files.
1
2
Entering edit mode
7.6 years ago
Jordan ★ 1.2k

Hi,

I have a BAM file which has chr id's as NC_00000*. I did sorting using samtools sort function.

I wanted to remove duplicates, so I'm using MarkDuplicates.jar from Picard tools to get the job done. But it gives me the following error:

Exception in thread "main" net.sf.picard.PicardException: 13_0501.sorted.bam is not coordinate sorted.
   at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
   at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
   at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158)
   at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

But I think my bam file is sorted. This is the header of my bam file.

@HD     VN:1.0  SO:unsorted
@SQ     SN:NC_000001    LN:249250621
@SQ     SN:NC_000002    LN:243199373
@SQ     SN:NC_000003    LN:198022430
@SQ     SN:NC_000004    LN:191154276
@SQ     SN:NC_000005    LN:180915260
@SQ     SN:NC_000006    LN:171115067
@SQ     SN:NC_000007    LN:159138663
@SQ     SN:NC_000008    LN:146364022
@SQ     SN:NC_000009    LN:141213431
@SQ     SN:NC_000010    LN:135534747
@SQ     SN:NC_000011    LN:135006516
@SQ     SN:NC_000012    LN:133851895
@SQ     SN:NC_000013    LN:115169878
@SQ     SN:NC_000014    LN:107349540
@SQ     SN:NC_000015    LN:102531392
@SQ     SN:NC_000016    LN:90354753
@SQ     SN:NC_000017    LN:81195210
@SQ     SN:NC_000018    LN:78077248
@SQ     SN:NC_000019    LN:59128983
@SQ     SN:NC_000020    LN:63025520
@SQ     SN:NC_000021    LN:48129895
@SQ     SN:NC_000022    LN:51304566
@SQ     SN:NC_000023    LN:155270560
@SQ     SN:NC_000024    LN:59373566
@PG     ID:0    PN:clcgenomicswb        VN:7.0

The header seems to suggest that it's unsorted. That's what is bothering me.

markduplicates samtools BAM sort picard • 6.4k views
ADD COMMENT
0
Entering edit mode

Did you sort by read names rather than chromosomal coordinates in samtools (the -n flag)? If so, this is the problem.

ADD REPLY
0
Entering edit mode

No, I did not do that. I checked the lines manually in bam file. They all seem to be sorted by coordinates not read names.

ADD REPLY
0
Entering edit mode

hello,Today,I meet the same question,which is very trouble.Did you give me some tips!thanks

ADD REPLY
0
Entering edit mode

I did get it fixed. What kind of error do you get?

ADD REPLY
7
Entering edit mode
7.6 years ago

your SAM Header starts with:

@HD     VN:1.0  SO:unsorted

And Picard check that this 'SO:' is set to 'coordinate'. See https://github.com/nh13/picard/blob/master/src/java/net/sf/picard/sam/MarkDuplicates.java

(....)
 if (!ASSUME_SORTED && header.getSortOrder() != SortOrder.coordinate) {
                throw new PicardException("Input file " + f.getAbsolutePath() + " is not coordinate sorted.");
            }

to fix this, use the option

ASSUME_SORTED=true
ADD COMMENT
0
Entering edit mode

Ah.. let me try that and I see. Thanks for the tip!

ADD REPLY

Login before adding your answer.

Traffic: 1781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6