Adding optional tags to a BAM file
2
0
Entering edit mode
3.8 years ago
sc243 ▴ 10

Hi Guys,

I have a question regarding my bam file (single-cell rna seq data). The first 6 letters of the read names of my bam file represent the cellbarcode info and next 6 represent UMI (separated by an underscore and afterwards there is a hash separating it from the rest of the name). For example-

ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9

I would like to convert these info into the CB CB:Z:ACAGTG and UB UB:Z:GAGAAG bam TAGs. And then the final read name becomes- K182:79:HV56XX:6:1101:11160:388737da9.

Can you please provide me with a one-liner awk command or some kind of tool using which I can do this?

Thanks a lot!

RNA-Seq bam sam single-cell awk • 4.2k views
ADD COMMENT
0
Entering edit mode

this works perfectly! thanks a lot!

ADD REPLY
0
Entering edit mode

update 2021: samtools 1.13: samtools view --add-flags/--remove-flags

ADD REPLY
0
Entering edit mode

Can you provide more information about how the new samtools feature --add-flags works? I would like to be able to use this feature, but for me it doesn't add a tag. For example, I tried to add a CB tag "AAAAAAAAAA" to each read in a small test bam:

samtools view --add-flags 'CB:Z:AAAAAAAAAA' test.bam | head -n 1

Results in the original bam read (i.e., with no new CB tag):

SOLEXA-1GA-2_2_FC20EMB:5:251:979:328 0 chr1 10145 25 36M * 0 0 AACCCCTAACCCTAACCCTAACCCTAACCCTAAACThhhhHcWhhHTghcKA_ONhAAEEBZE?H?CBC?DA NM:i:1 X1:i:1 MD:Z:33A2

ADD REPLY
0
Entering edit mode

nice catch. I was wrong --add-flag is about mapping flag (integer), not attributes, my solution above was wrong.

ADD REPLY
7
Entering edit mode
3.8 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

$ cat jeter.sam 
@SQ SN:ref  LN:45
ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9 1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   XX:B:S,12561,2,20,112

.

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("#"),u=s.indexOf("_");record.setReadName(s.substring(h+1));record.setAttribute("CB",s.substring(0,u));record.setAttribute("UB",s.substring(u+1,h));return record;' jeter.sam  2> /dev/null 
@HD VN:1.5  SO:unsorted
@SQ SN:ref  LN:45
K182:79:HV56XX:6:1101:11160:388737da9   1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   CB:Z:ACAGTG UB:Z:GAGAAG XX:B:S,12561,2,20,112
ADD COMMENT
0
Entering edit mode

Hi Pierre, Just one more question, if the format is rather K182:79:HV56XX:6:1101:11160:388737da9_ACAGTG_GAGAAG can you please tell me the modified samjdk.jar command to provide the tags? Thanks! That will be really helpful!

PS: May be this? Just worked on it....

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("_"),u=s.lastIndexOf("_");record.setReadName(s.substring(0,h-1));record.setAttribute("CB",s.substring(h+1,u));record.setAttribute("UB",s.substring(u+2,u+7));return record;' jeter.sam  2> /dev/null
ADD REPLY
0
Entering edit mode

something like (? not tested)

String tokens[] = record.getReadName().split("[_]"); record.setReadName(tokens[0]);record.setAttribute("CB",tokens[1]);record.setAttribute("UB",tokens[2]); return record;

?

ADD REPLY
1
Entering edit mode
5 months ago
benformatics ★ 2.9k

This can be done in R but it's not exactly fast...

library(GenomicAlignments)
library(rtracklayer)

## setting up all bam flags (in my case this a BWA alignment and these are the default cols/flags)
param <- ScanBamParam(what=c("qname","flag","rname", "strand", "pos", "qwidth", "mapq", "cigar","mrnm","mpos","isize","seq","qual"),tag=c("NM","MD","MC","AS","XS"))

## reading original bam file
gr.aln <- readGAlignments("your_bam_file.bam", param=param, use.names=TRUE)

## adding my custom bam tag "RX" (in this case you could use CB/UB etc..)
## in my case the UMI is part of the read name presented as "UM1_UM2_READNAME" you need to modify the regex below for your use case
mcols(gr.aln)$RX <- sub("_.*","",sub("_","",mcols(gr.aln)$qname))

## this would be yours
mcols(gr.aln)$CB <- sub("_.*","",mcols(gr.aln)$qname)
mcols(gr.aln)$UB <- sub(".*_","",sub("#.*","",mcols(gr.aln)$qname))


## export new BAM file with UMI's stored as RX tag
export(gr.aln,'your_bam_file_with_umi.bam')
ADD COMMENT

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6