Question: Adding optional tags to a BAM file
0
gravatar for sc243
2.1 years ago by
sc24310
sc24310 wrote:

Hi Guys,

I have a question regarding my bam file (single-cell rna seq data). The first 6 letters of the read names of my bam file represent the cellbarcode info and next 6 represent UMI (separated by an underscore and afterwards there is a hash separating it from the rest of the name). For example-

ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9

I would like to convert these info into the CB CB:Z:ACAGTG and UB UB:Z:GAGAAG bam TAGs. And then the final read name becomes- K182:79:HV56XX:6:1101:11160:388737da9.

Can you please provide me with a one-liner awk command or some kind of tool using which I can do this?

Thanks a lot!

awk rna-seq sam bam single-cell • 1.6k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by sc24310

this works perfectly! thanks a lot!

ADD REPLYlink written 2.1 years ago by sc24310

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 2.1 years ago by Pierre Lindenbaum131k
4
gravatar for Pierre Lindenbaum
2.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

$ cat jeter.sam 
@SQ SN:ref  LN:45
ACAGTG_GAGAAG#K182:79:HV56XX:6:1101:11160:388737da9 1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   XX:B:S,12561,2,20,112

.

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("#"),u=s.indexOf("_");record.setReadName(s.substring(h+1));record.setAttribute("CB",s.substring(0,u));record.setAttribute("UB",s.substring(u+1,h));return record;' jeter.sam  2> /dev/null 
@HD VN:1.5  SO:unsorted
@SQ SN:ref  LN:45
K182:79:HV56XX:6:1101:11160:388737da9   1   ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   CB:Z:ACAGTG UB:Z:GAGAAG XX:B:S,12561,2,20,112
ADD COMMENTlink written 2.1 years ago by Pierre Lindenbaum131k

Hi Pierre, Just one more question, if the format is rather K182:79:HV56XX:6:1101:11160:388737da9_ACAGTG_GAGAAG can you please tell me the modified samjdk.jar command to provide the tags? Thanks! That will be really helpful!

PS: May be this? Just worked on it....

$ java -jar dist/samjdk.jar -e 'String s=record.getReadName(); int h=s.indexOf("_"),u=s.lastIndexOf("_");record.setReadName(s.substring(0,h-1));record.setAttribute("CB",s.substring(h+1,u));record.setAttribute("UB",s.substring(u+2,u+7));return record;' jeter.sam  2> /dev/null
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by sc24310

something like (? not tested)

String tokens[] = record.getReadName().split("[_]"); record.setReadName(tokens[0]);record.setAttribute("CB",tokens[1]);record.setAttribute("UB",tokens[2]); return record;

?

ADD REPLYlink written 2.1 years ago by Pierre Lindenbaum131k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1405 users visited in the last hour