Question: PICARD MarkDuplicates with random barcodes
0
gravatar for Nicolas Rosewick
23 months ago by
Belgium, Brussels
Nicolas Rosewick7.5k wrote:

Hi,

I've paired-end 2x100bp targeted DNA-seq reads that spans multiple regions in the genome. Read 2 contains 2 barcodes :

  • bp 1-10 : barcode 1
  • bp 11-19 : barcode 2

These barcodes are usefull to distinguish the differents samples (barcode 2) , and between DNA fragment (barcode 2). What I want is a bam file for each sample and to remove the duplicate reads (same barcode 1 and same alignment position). I saw in PICARD MarkDuplicates a barcode option :

BARCODE_TAG (String) Barcode SAM tag (ex. BC for 10X Genomics) Default value: null.

READ_ONE_BARCODE_TAG (String) Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

READ_TWO_BARCODE_TAG (String) Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null.

But I'm a little bit lost how to specify to picard the position within read 2 to check. Any ideas ?

If PICARD is not suited for this task, I thought to parse R2 and extract barcode 1 and 2 remove the duplicates by checking alignment position and barcode informations..

Thanks

edit : I just found this paper discussing barcodes (or UMIs) : http://genome.cshlp.org/content/early/2017/01/18/gr.209601.116.abstract . A good start

picard barcode • 1.3k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by Nicolas Rosewick7.5k

edit : I just found this paper discussing barcodes (or UMIs)

The Ph. D. thesis of Kasper Karlsson is also a very good read about UMIs.

ADD REPLYlink modified 23 months ago • written 23 months ago by Charles Plessy2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour