Separate single cell BAM file by the cell barcode
1
0
Entering edit mode
9 months ago
zbidav ▴ 30

Dear Biostar community,

I am a bit new to Dropseq analysis (10x sequenced files, if not mistaken). I followed the standard CellRanger protocol and received an aligned BAM file of the samples and the files for downstream analysis with Seurat.

I wonder if there is an efficient way to separate the resulting BAM file into multiple files by the Cell Barcode. (it is a special attribute in the bam file - "CB"). I tried to do it using samtools, but due to the large file number, it was not so efficient.

If summarized: "My input is a single-cell BAM file, and the output is separated bam files - one for each cell barcode. Do you know a tool that can do it or an efficient way to do so?

Much appreciated!

BAM scRNAseq single-cell • 1.5k views
ADD COMMENT
0
Entering edit mode
9 months ago
GenoMax 142k

10x makes a tool available: https://github.com/10XGenomics/subset-bam

Also: https://github.com/timoast/sinto

This additional line serves no purpose. Required to defeat biostars spam code.

ADD COMMENT
0
Entering edit mode

Hi, thanks for the answer! Sadly this not quite what I am trying to do :( if I may quote from the manual (of subset-bam):

subset-bam ... takes a 10x Genomics BAM file, a CSV file defining the subset of cells you want to isolate, and produces a new BAM file with only alignments associated with those cells.

This tool is very useful in creating pseudo-bulk files from multiple cells. In my case, as I need to separate each single cell to a different file. I can create a temporary file of a single csv, but I wonder if that approach is indeed efficient (compared to the naive one using samtools)

ADD REPLY
1
Entering edit mode

Looks like sinto is multi-threaded so it may be more performant: https://timoast.github.io/sinto/basic_usage.html#filter-cell-barcodes-from-bam-file

How many cell barcodes were you planning to use?

ADD REPLY
0
Entering edit mode

Hmmm... all of them :) To be more exact - all the barcodes that resulted from CellRanger. I think in average ~8,000-10,000 barcodes.

ADD REPLY
1
Entering edit mode

You may want to run subset-bam using GNU parallel where each temporary single line csv is read in through parallel's ::: option. I am no expert in GNU parallel so take my advice with a grain of salt.

ADD REPLY

Login before adding your answer.

Traffic: 2314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6