Dear Biostar community,
I am a bit new to Dropseq analysis (10x sequenced files, if not mistaken). I followed the standard CellRanger protocol and received an aligned BAM file of the samples and the files for downstream analysis with Seurat.
I wonder if there is an efficient way to separate the resulting BAM file into multiple files by the Cell Barcode. (it is a special attribute in the bam file - "CB"). I tried to do it using samtools, but due to the large file number, it was not so efficient.
If summarized: "My input is a single-cell BAM file, and the output is separated bam files - one for each cell barcode. Do you know a tool that can do it or an efficient way to do so?
Much appreciated!
Hi, thanks for the answer! Sadly this not quite what I am trying to do :( if I may quote from the manual (of subset-bam):
This tool is very useful in creating pseudo-bulk files from multiple cells. In my case, as I need to separate each single cell to a different file. I can create a temporary file of a single csv, but I wonder if that approach is indeed efficient (compared to the naive one using samtools)
Looks like
sinto
is multi-threaded so it may be more performant: https://timoast.github.io/sinto/basic_usage.html#filter-cell-barcodes-from-bam-fileHow many cell barcodes were you planning to use?
Hmmm... all of them :) To be more exact - all the barcodes that resulted from CellRanger. I think in average ~8,000-10,000 barcodes.
You may want to run subset-bam using GNU parallel where each temporary single line csv is read in through parallel's
:::
option. I am no expert in GNU parallel so take my advice with a grain of salt.