Cellranger output .bam file does Not contain barcodes ( CB and UB )
0
0
Entering edit mode
13 months ago
Gabriel ▴ 100

I will describe my troubleshooting in a timeline

Background: scRNA-seq prepared by Chromium 10x (i think version 3.0), and sequenced by Illumina. Then alignement and assembly of libraries was done with standard Cellranger protocol in 2019.

I am trying to do the velocyto protocol, using the standard Run10x function on the /outs folder I get the errors

2020-11-02 14:43:11,342 - WARNING - Not found cell and umi barcode in entry 1090 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1093 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1097 of the bam file
2020-11-02 14:43:11,343 - WARNING - Not found cell and umi barcode in entry 1098 of the bam file

etc...

The .bam file in question is in the outs folder /outs/possorted_genome_bam.bam

Velocyto requires error corrected CB / UB barcodes in the tag section http://velocyto.org/velocyto.py/tutorial/cli.html#requirements-on-the-input-files

As seen in cellranger support page here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/bam The sam/bam files are supposed to contain error corrected Cellular (CB) and UMI (UB) barcodes.

Cellranger BAM Barcode Tags

However, when I looked into the samfile, using simplesam python library I get

>>> x.tags
{'NH': 4, 'UY': '##########', 'nM': 1, 'CY': '################', 'li': 0, 'RE': 'I', 'AS': 93, 'HI': 2, 'CR': 'NNNNNNNNNNNNNNNN', 'UR': 'NNNNNNNNNN', 'RG': 'AAcount:0:1:CE2WPANXX:3'}
>>> x=next(in_sam)
>>> x.tags
{'NH': 5, 'UY': '##########', 'nM': 0, 'CY': '################', 'li': 0, 'RE': 'I', 'AS': 96, 'HI': 2, 'CR': 'NNNNNNNNNNNNNNNN', 'UR': 'NNNNNNNNNN', 'RG': 'AAcount:0:1:CE2WPANXX:5'}

etc...

Therefore, there are no CB or UB tags, only empty CR tags, and RG tags, whatever that means (I am not a specialist in .sam file format and conventions)

I have been looking around, and someone suggested the Cell and UMI barcodes are in the QNAME (Read ID) string. And that they could be added into the .tag fields, see Add tags to BAM/SAM file and https://github.com/velocyto-team/velocyto.py/issues/107

I tried printing it:

>>> x.qname
'D00624:100:CE2WPANXX:3:2315:13760:4349'

>>> x.qname
'D00624:100:CE2WPANXX:5:2209:14115:11948'

Etc.. According to some of these posts, the CB or UB might be contained in

Supposedly the UMI tag?
>>> x.qname.split(":")[2]
'CE2WPANXX'
Supposedly the barcode tag?
>>> x.qname.split(":")[1]
'100'

But neither of these look like valid barcodes to me.

Therefore my question is how do I obtain the Error Corrected barcodes and add them to my .bam file where they are missing, do I need to re-run the cellranger alignment ? I am thinking this may be some sort of problem related to an older version of Cellranger.

cellranger barcodes Velocyto chromium10x UMI • 1.4k views
ADD COMMENT
1
Entering edit mode

You should check how the BAM was generated from Cellranger. Normally it will indeed contain these tags. An example for cellranger v.3.1.

CR:Z:AGAAGCGGTAACAGGC   CY:Z:CCC9@EEGGCFGFGGG     CB:Z:AGAAGCGGTAACAGGC-1 UR:Z:CGAATATTGTCG       UY:Z:FGEGGGGGAFGC       UB:Z:CGAATATTGTCG
ADD REPLY

Login before adding your answer.

Traffic: 2371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6