Set bam read group from existing tag
0
3
Entering edit mode
4.2 years ago
jbalberge ▴ 170

In bam file from single-cell RNA-seq experiments processed with 10X's cellranger 3.0 pipeline, the @RG:SM tag is set by sample while the @CB is computed from cell barcode.

$ samtools view possorted_genome_bam.bam | head -n 1
> NS500651:49:"RUN":4:12405:6951:17414  256 1   11256   0   97M1S   *   0   0   "SEQ" "qual"
> NH:i:5    HI:i:2  AS:i:83 nM:i:6  RE:A:I  li:i:0  BC:Z:ATCTTTAG   QT:Z:A/AAAAEE   CR:Z:GAACGGACAGACGCAA   CY:Z:/AAAAEAEEEEEEEEE   CB:Z:GAACGGACAGACGCAA-1 UR:Z:TAGCTGTAAT UY:Z:EEEEEEEEEE UB:Z:TAGCTGTAAT RG:Z:MYSAMPLE:0:1:"RUN":4

For cell-specific genotyping, it would make more sense to treat the @CB barcode as individual's identity.

Is there an efficient way to move the @CB tag values to @RG tag ? This way, each unique @RG defines a cellular barcode.

Currently I do:

(1) split the bam by @CB tag (thanks to https://www.biostars.org/p/263346/#263348)

samtools view -@ 32 <input.bam> | cut -f 12- | tr "\t" "\n" | grep  "^CB:Z:" | cut -d ':' -f 3 | sort | uniq | while read S; do samtools view -h <input.bam> | awk -v tag="CB:Z:$S" '($0 ~ /^@/ || index($0,tag)>0)' > out/${S}.sam ; done

(2) edit @RG SM based on file name with Picard/AddOrReplaceReadGroups

for s in $(ls out/*.sam); do SM=$(basename $s .sam) ; java -jar /sandbox/apps/bioinfo/binaries/picard/2.19.0/picard.jar AddOrReplaceReadGroups I=$s O=out-RGSM/${SM}.bam RGID=1 RGLB=0.1 RGSM=${SM} RGPU=MYSAMPLE RGPL=ILLUMINA; done
samtools merge -r - out-RGSM/*.bam | samtools view -o merged-RGSM.bam -
samtools index merged-RGSM.bam

(3) call variants per sample

bcftools mpileup -r chr22 -Ou -f GRCh38.fasta merged-RGSM.bam | bcftools call -mv -Ob -o calls-RGSM.bcf

This is very inefficient and memory/disk-consuming. How could I improve this process by either (1) properly replace @RG:SM with CB or to run (2) bcftools mpileup with CB instead of RG:SM?

Thanks, JB

RNA-Seq bam samtools bcftools • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi JB, I am dealing with a similar problem and using your solution, thanks! .. but did you ( or somebody else) find a better solution for this? Cheers, Raúl

ADD REPLY

Login before adding your answer.

Traffic: 3208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6