Marking Duplicates With Molecular Tag (Umi)
1
0
Entering edit mode
10.9 years ago
brentp 24k

I have some reads that contain a molecular index so I can know whether they are PCR duplicates. I am going to use the 0x400 flag as specified in the SAM spec to mark them as optical/PCR duplicates.

Should I mark all of the reads in a group (having the same POS and molecular tag) with that flag or should I leave the one with the highest quality (or by whatever metric) unmarked?

I will be sending the result to the GATK SNP-calling pipeline.

markduplicates index • 5.5k views
ADD COMMENT
0
Entering edit mode

I'm not sure I understand:are your SAM records already marked with the flag 0x4 ? or are you looking for a method to set the flag according to the chrom/pos/your-index ?

ADD REPLY
0
Entering edit mode

No, they are mapped reads. I want to set the flag 0x400 (1024) to show that they are PCR duplicates.

ADD REPLY
0
Entering edit mode

Hi Brent,

Did you ever get this working?

Is the code available for download somewhere? I need it but don't want to re-invent the wheel.

Thank you

ADD REPLY
0
Entering edit mode

Hi Brent,

I know this is old, but in case anyone else needs this, I have code here to add the UMI to the bam file by reading information from an original FASTQ: https://github.com/mbusby/AddUMIsToBam in the RX and QX fields.

The Picard MarkDuplicates, I hear from that team but did not test myself yet, will handle this in its duplicate marking.

ADD REPLY
1
Entering edit mode
10.9 years ago

you have to leave one read with the highest quality and flag the others with the duplicate flag.

here is the code of picard markdup: all the reads are flagged but the best: (https://github.com/nh13/picard/blob/master/src/java/net/sf/picard/sam/MarkDuplicates.java )

for (final ReadEnds end : list) {
    if (end != best) {
        addIndexAsDuplicate(end.read1IndexInFile);
        addIndexAsDuplicate(end.read2IndexInFile);
    }
ADD COMMENT
0
Entering edit mode

yes, if I understand correctly, that seems to be the logic here: https://picard.svn.sourceforge.net/svnroot/picard/trunk/src/java/net/sf/picard/sam/MarkDuplicates.java

though I don't see them sorting by quality.

ADD REPLY
1
Entering edit mode

in the source it is stored in the member "store"

private short getScore(final SAMRecord rec) {
        short score = 0;
        for (final byte b : rec.getBaseQualities()) {
            if (b >= 15) score += b;
        }
(...)
     pairedEnds.score += getScore(rec);

and then this score is used to get the best pair:

for (final ReadEnds end : list) {
            if (end.score > maxScore || best == null) {
                maxScore = end.score;
                best = end;
            }
        }
ADD REPLY
0
Entering edit mode

cheers, my attention span is too short for java.

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6