Question: Marking Duplicates With Molecular Tag (Umi)
0
gravatar for brentp
5.1 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I have some reads that contain a molecular index so I can know whether they are PCR duplicates. I am going to use the 0x400 flag as specified in the SAM spec to mark them as optical/PCR duplicates.

Should I mark all of the reads in a group (having the same POS and molecular tag) with that flag or should I leave the one with the highest quality (or by whatever metric) unmarked?

I will be sending the result to the GATK SNP-calling pipeline.

markduplicates index • 2.6k views
ADD COMMENTlink modified 5.1 years ago by Pierre Lindenbaum108k • written 5.1 years ago by brentp22k

I'm not sure I understand:are your SAM records already marked with the flag 0x4 ? or are you looking for a method to set the flag according to the chrom/pos/your-index ?

ADD REPLYlink written 5.1 years ago by Pierre Lindenbaum108k

No, they are mapped reads. I want to set the flag 0x400 (1024) to show that they are PCR duplicates.

ADD REPLYlink written 5.1 years ago by brentp22k

Hi Brent,

Did you ever get this working?

Is the code available for download somewhere?  I need it but don't want to re-invent the wheel.

Thank you

ADD REPLYlink written 3.9 years ago by Michele Busby1.8k

Hi Brent,

I know this is old, but in case anyone else needs this, 

I have code here to add the UMI to the bam file by reading information from an original FASTQ:

https://github.com/mbusby/AddUMIsToBam

in the RX and QX fields.

The Picard MarkDuplicates, I hear from that team but did not test myself yet, will handle this in its duplicate marking.

ADD REPLYlink written 2.4 years ago by Michele Busby1.8k
1
gravatar for Pierre Lindenbaum
5.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

you have to leave one read with the highest quality and flag the others with the duplicate flag.

here is the code of picard markdup: all the reads are flagged but the best: (https://github.com/nh13/picard/blob/master/src/java/net/sf/picard/sam/MarkDuplicates.java )

for (final ReadEnds end : list) {
    if (end != best) {
        addIndexAsDuplicate(end.read1IndexInFile);
        addIndexAsDuplicate(end.read2IndexInFile);
    }
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Pierre Lindenbaum108k

yes, if I understand correctly, that seems to be the logic here: https://picard.svn.sourceforge.net/svnroot/picard/trunk/src/java/net/sf/picard/sam/MarkDuplicates.java

though I don't see them sorting by quality.

ADD REPLYlink written 5.1 years ago by brentp22k
1

in the source it is stored in the member "store"

private short getScore(final SAMRecord rec) {
        short score = 0;
        for (final byte b : rec.getBaseQualities()) {
            if (b >= 15) score += b;
        }
(...)
     pairedEnds.score += getScore(rec);

and then this score is used to get the best pair:

for (final ReadEnds end : list) {
            if (end.score > maxScore || best == null) {
                maxScore = end.score;
                best = end;
            }
        }
ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Pierre Lindenbaum108k

cheers, my attention span is too short for java.

ADD REPLYlink written 5.1 years ago by brentp22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 840 users visited in the last hour