Adding MBC/UMI RX tag to read name
1
0
Entering edit mode
13 months ago
Christian ▴ 30

Hi all,

I have a BAM/SAM file with molecular barcode (MBC) / unique molecular identifier (UMI) stored in the RX tag (after preprocessing with AGeNT Trimmer). Here is an example, where RX:Z: is the tag containing the UMI TCA-TTA.

A00620:188:HVMYMDSX2:4:2213:4255:9392   147 chr1    10000   0   96M =   10005   -91 ATAACCCTAACCCTAACACTAACACTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC,;-<--9:=<9-8:,;9+.:86.,9;><,,@;8<A-@;7(--A<?+-9/28>--A<?6-AA<?6A8.;><@@@;=;@?@:=;?>>9=:>==:=;>9 ZA:Z:TCACT  ZB:Z:TTAGT  BC:Z:GTCTCTTC+GAGGCTTC  MC:Z:96M    MD:Z:17C5C15A56 RG:Z:@A00620:188:HVMYMDSX2:4:1101:6470:1016 MI:Z:TCATTA NM:i:3  AS:i:81 XS:i:80 QX:Z:FFF FFF    RX:Z:TCA-TTA

I am trying to append the UMI to the read name (requirement for a tool I am trying to implement).This is what I have tried:

samtools view -H tmp.sam > tmp_header.sam
samtools view tmp.sam | awk '{OFS = "\t"} {for(i=1;i<=NF;i++)  if ($i ~ /^RX:Z:/) {$1=$1"_"$i; gsub("RX:Z:","",$1); print}}' >> tmp_header.sam

(see also this Bioinformatics Stack Exchange answer)

The last step throws an error, as samtools cannot parse the file after the modification:

[E::aux_parse] unrecognized type '\t'
[W::sam_read1_sam] Parse error at line 101

This is the line that cannot be parsed when opened with vim:

A00620:188:HVMYMDSX2:4:2213:4255:9392_TCA-TTA   147     chr1    10000   0       96M     =       10005   -91     ATAACCCTAACCCTAACACTAACACTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC        ,;-<--9:=<9-8:,;9+.:86.,9;><,,@;8<A-@;7(--A<?+-9/28>--A<?6-AA<?6A8.;><@@@;=;@?@:=;?>>9=:>==:=;>9        ZA:Z:TCACT      ZB:Z:TTAGT      BC:Z:GTCTCTTC+GAGGCTTC  MC:Z:96M        MD:Z:17C5C15A56 RG:Z:@A00620:188:HVMYMDSX2:4:1101:6470:1016     MI:Z:TCATTA     NM:i:3  AS:i:81 XS:i:80 QX:Z:FFF        FFF     RX:Z:TCA-TTA      

This looks very close to what I am trying to do - the tag has been appended to the read name. I cannot quite figure out what causes the parsing error. Any help would be appreciated.

Best wishes

Christian

umi • 670 views
ADD COMMENT
1
Entering edit mode
13 months ago
Christian ▴ 30
# this script takes a BAM file and moves the UMI tag from the RX tag
# and appends it to the read name
# argument 1: input file
# argument 2: output file

import pysam
import re
import sys

# open the input BAM file
input1 = sys.argv[1]
input_bam = pysam.AlignmentFile(input1, "rb")

# define and open the output BAM file
output_bam = pysam.AlignmentFile(sys.argv[2], "wb", template=input_bam)

# iterate over reads
for read in input_bam:
    umi = read.get_tag("RX")
    umi = re.sub(r'\W+', '', umi)
    read.qname = f"{read.qname}_{umi}"
    output_bam.write(read)

# close connection
input_bam.close()
output_bam.close()

This is how I ended up solving it.

ADD COMMENT

Login before adding your answer.

Traffic: 2832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6