Disappearing CB, the bam tag after samtools sort -t CB
1
0
Entering edit mode
3.5 years ago
akh22 ▴ 110

I've been trying to setup an analysis pipline for RNAvelocity in AWS EC2. I used one of the 10x dataset, 10k Peripheral blood mononuclear cells (PBMCs) from a healthy donor, Single Indexed, as a test model to setup the pipeline. For speed and cost saving, I first used samtools to sort a 10PBMC bam file from 10x by firing a following command;

samtools sort -l 7 -m 2048M -t CB -O BAM -@100 -o /temp/home/cellsorted_PBMC.bam /temp/home/PBMC_10K.bam

and then,

velocyto run -b filtered_feature_bc_matrix/barcodes.tsv -o /temp/home -m GRCh38_rmsk.gtf cellsorted_PBMC.bam.bam refdata-gex-GRCh38-2020-A/genes/genes.gtf

Veoclyte complained that there is no CB tag in the 10K PBMC.bam, when I examined the bam file, I saw absolutely no CB in the sorted bam, as follows,

A00519:643:HCMYWDSXY:4:2172:22525:26287 16  chr1    148893  255 91M *   0   0   ACATGGCAAGATCCCGTCTCTATGATAAAAAATTAGCTGGACATGGTGGCACATGTCTGTAGTCCCAGCTACTTGGGAGACTGAAGTGAGA FFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6  HI:i:2  AS:i:89 nM:i:0  RG:Z:SC3_v3_NextGem_SI_PBMC_10K:0:1:HCMYWDSXY:4 TX:Z:ENST00000484859,+724,91M   GX:Z:ENSG00000241860    GN:Z:AL627309.5 fx:Z:ENSG00000241860    RE:A:E  MM:i:1  xf:i:17 CR:Z:GCAGCTCTGTGAATAT   CY:Z:FFFFFFFFFFFFFFFF   UR:Z:TCTAAAACCTAC   UY:Z:FFFFFF:FFFFF   UB:Z:TCTAAAACCTAC

The original unsorted bam has CB tag,

A00519:643:HCMYWDSXY:3:2144:3649:12790  16  chr1    498309  1   65M26S  *   0   0   GGCCAAAATATGTAAGCACATTTGCATTTATTAGGCACTTTATTTCCATTATTACACTGTGATATCCCATGTACTCTGCGTTGATACCACT F,,,FF:F,FFFFF:FFFF:FFFFFFFFFFFFF:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFF:FFF:FFFFFFFFFFF NH:i:4  HI:i:1  AS:i:61 nM:i:1  ts:i:26 RG:Z:SC3_v3_NextGem_SI_PBMC_10K:0:1:HCMYWDSXY:3 RE:A:I  xf:i:0  CR:Z:TCATTGTAGTATAGAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:TCATTGTAGTATAGAC-1 UR:Z:ACTCTAATCTGC   UY:Z:FFFFF:FFFFFF   UB:Z:ACTCTAATCTGC

An interesting thing is when I sorted a smaller, truncated version of PBMC_10K.bam (created by samtools view -h Parent_NGSC3_DI_PBMC_possorted_genome_bam.bam|head -n 10000 | samtools view -bS > test.bam) by the exact same samtool command, I saw the CB tag preserved in the sorted bam.

Does anybody have any idea as to why sorting the entire PBMC_10K.bam based on the CB deletes the CB tag in the sorted bam while the CB tags are spared in sorted the smaller version of the same bam. I'd appreciate any pointers at this point. Thanks.

using samtools --version samtools 1.11 Using htslib 1.11 Copyright (C) 2020 Genome Research Ltd

sequencing Samtools RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi, I meet the same problem, did you solve this?

ADD REPLY
0
Entering edit mode

Just feed Velocyte a unsorted bam file. Velocyto will sort the bam file regardless of whether they are sorted or not. Resorting of the sorted bam by Velocyto appears to introduce this error. I don't know if it is due to the bug or not.

ADD REPLY
0
Entering edit mode
3 months ago
liu2005678 • 0

when sort with samtools with tag (-t), the software place those reads without tags first. so, if you use "samtools view tag.sorted.bam | head -n x_lines" to view the first lines, the reads may do not have tag. see https://github.com/samtools/samtools/issues/1069#issuecomment-507271307

ADD COMMENT

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6