Question: How to add RG tag into the optional field in a BAM file
1
gravatar for gundalav
4 months ago by
gundalav260
La La Land
gundalav260 wrote:

I have a BAM file that looks like this:

enter image description here

As you notice that the barcodes are included as part of read names.

Then I'm trying to use a tool called chromVAR that require RG tag to be included in the BAM file. RG tags are used to distinguish reads from different cells or samples (this is single cell ATAC-seq). Note this is not @RG header but tag for every reads as optional fields.

This is the example of RG tag as optional field (taken from another BAM file):

enter image description here

The value of that RG tag could be just the corresponding BARCODE id of that read.

My question is how can I add the the RG tag into it? I looked at PICARD AddOrReplaceReadGroups but it seems only to add as header not for every read.

ADD COMMENTlink modified 4 months ago by finswimmer6.1k • written 4 months ago by gundalav260

Do you have a correspondence between the barcode and read group? ex:

CAACCATCACTC   sample10
ADD REPLYlink written 4 months ago by Gabriel R.2.5k

Yes I have. BTW what I mean by RG tag is the one indicated in every read not as @RG header.

ADD REPLYlink modified 4 months ago • written 4 months ago by gundalav260
1

yes I am aware. This is essentially a demultiplexing problem. if no one answers by the end of the day, I could be cajoled into making a slight modification to deML to account for this.

ADD REPLYlink written 4 months ago by Gabriel R.2.5k
1

ok lets try the following:

1) sort your bam files wrt read names, NOT coordinates.

2) run the following:

samtools view sortedWRTnames.bam  | awk '{ if(substr($1,1,1)=="@"){print $0}else{ idx=substr($1,0,12); printf("%s\t", substr($1,14)); for(i=2;i<=NF;i++){ printf("%s\t",$i); } printf("XI:Z:%s\t",idx); print("YI:Z:DDDDDDDDDDDD"); } }' |samtools view -bS > sortedWRTnames_withtags.bam

3) run deML:

deML -i index.txt  -o sortedWRTnames_withtags.demultiplex.bam sortedWRTnames_withtags.bam

A few of these steps can be replaced with pipes. the index.txt is the correspondence sequence to ID :

#Index1 Name
 AACCATCACTC   sample10
ADD REPLYlink modified 4 months ago • written 4 months ago by Gabriel R.2.5k
2
gravatar for finswimmer
4 months ago by
finswimmer6.1k
Germany
finswimmer6.1k wrote:

Hello,

you could try this:

samtools view -h in.bam|awk '{ if($0 ~ "^@") {print $0} else {split($1,a,":"); gsub(/RG:Z:[^\t]*/, "RG:Z:"a[1]); print} }'|samtools view -b -o out.bam

If the line doesn't start with @, the first column is split by :. So now we should have the barcode in a[1]. gsub replaces the RG:Z tag now with this barcode.

fin swimmer

ADD COMMENTlink modified 4 months ago • written 4 months ago by finswimmer6.1k
1
gravatar for jkbonfield
4 months ago by
jkbonfield40
jkbonfield40 wrote:

There is samtools addreplacerg which adds or replaces RG tags in records too, but it is a fixed string rather than derived per barcode. This may be useful if you already have files split up per barcode, but not otherwise.

If they're all mixed together, then you'll need to read in a table and do a lookup yourself. A hacky and badly tested perl 1-liner for this:

samtools view -h in.bam | perl -lne 'BEGIN {$"="\t";open($fh, "rg.txt"); while (<$fh>) {chomp($_);($a,$b)=/(\S+)\s+(\S+)/;$rg{$a}=$b}} if (/^@/) {print;next} ($k)=/^([^|]*)/;if (exists($rg{$k})) {print "$_\tRG:Z:$rg{$k}"} else {print "$_"}' | samtools view -b -o out.bam -

It reads a file called rg.txt which contains barcode and RG tag name per line. Note this doesn't do anything to add these to the @RG header tags, but there are other tools for that - or hack it in situ in the BEGIN block. :-)

ADD COMMENTlink written 4 months ago by jkbonfield40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1177 users visited in the last hour