Question: Adding optional tags to a BAM file
0
gravatar for sc243
22 months ago by
sc24310
sc24310 wrote:

Hi Guys,

I have a question regarding my bam file (single-cell rna seq data). The first 6 letters of the read names of my bam file represent the cellbarcode info and next 6 represent UMI (separated by an underscore and afterwards there is a hash separating it from the rest of the name). For example-

ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9

I would like to convert these info into the CB CB:Z:ACAGTG and UB UB:Z:GAGAAG bam TAGs. And then the final read name becomes- K182:79:HV56XX:6:1101:11160:388737da9.

Can you please provide me with a one-liner awk command or some kind of tool using which I can do this?

Thanks a lot!

awk rna-seq sam bam single-cell • 1.3k views
ADD COMMENTlink modified 22 months ago • written 22 months ago by sc24310

this works perfectly! thanks a lot!

ADD REPLYlink written 22 months ago by sc24310

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 22 months ago by Pierre Lindenbaum129k
4
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

$ cat jeter.sam 
@SQ SN:ref  LN:45
ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9 1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   XX:B:S,12561,2,20,112

.

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("#"),u=s.indexOf("_");record.setReadName(s.substring(h+1));record.setAttribute("CB",s.substring(0,u));record.setAttribute("UB",s.substring(u+1,h));return record;' jeter.sam  2> /dev/null 
@HD VN:1.5  SO:unsorted
@SQ SN:ref  LN:45
K182:79:HV56XX:6:1101:11160:388737da9   1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   CB:Z:ACAGTG UB:Z:GAGAAG XX:B:S,12561,2,20,112
ADD COMMENTlink written 22 months ago by Pierre Lindenbaum129k

Hi Pierre, Just one more question, if the format is rather K182:79:HV56XX:6:1101:11160:388737da9_ACAGTG_GAGAAG can you please tell me the modified samjdk.jar command to provide the tags? Thanks! That will be really helpful!

PS: May be this? Just worked on it....

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("_"),u=s.lastIndexOf("_");record.setReadName(s.substring(0,h-1));record.setAttribute("CB",s.substring(h+1,u));record.setAttribute("UB",s.substring(u+2,u+7));return record;' jeter.sam  2> /dev/null
ADD REPLYlink modified 22 months ago • written 22 months ago by sc24310

something like (? not tested)

String tokens[] = record.getReadName().split("[_]"); record.setReadName(tokens[0]);record.setAttribute("CB",tokens[1]);record.setAttribute("UB",tokens[2]); return record;

?

ADD REPLYlink written 22 months ago by Pierre Lindenbaum129k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour