Question: split BAM file with same cell-barcode and UMI pair
gravatar for newbinf
23 months ago by
newbinf0 wrote:

I have single cell RNA-seq reads (from 10x Chromium) that have already been pre-processed. The cell-bacode and UMI tag were cut-and-pasted to the header (via umi-tools) and low quality reads were removed. Next, I mapped the reads (with STAR) and isolated the reads from a gene of interest (via samtools). At the end of the day I want to genotype each cell for a specific gene, using the cell-barcode and UMI pair, and call variants.

How do I split the BAM file into separate BAM files based on the cell-barcode and UMI pair? In other words, I want a bam file of aligned reads with the same cell-barcode and UMI pair.

Thank you!

ADD COMMENTlink modified 23 months ago • written 23 months ago by newbinf0

You really want your data split into hundreds of thousands of files?

ADD REPLYlink written 23 months ago by swbarnes28.6k

There are only about 50 cell barcodes in my gene/region of interest. So it would be about 50-70 files.

I should clarify I used samtools to only grab the portion of the bam file with alignments to one gene.

ADD REPLYlink modified 23 months ago • written 23 months ago by newbinf0

Okay, so 50 cell barcodes times, 20 UMIs per sample? A thousand files, are you sure this is helpful? 10xgenomics software will tag every read with cell barcode and gene, why can't you make use of that?

ADD REPLYlink written 23 months ago by swbarnes28.6k

Sorry for the late reply! I am using the 10x cell barcode tags, they are now placed in the read headers. I cut out the tags because I do not want misalignments caused by the cell barcode and UMI tags.

ADD REPLYlink written 22 months ago by newbinf0

Are you sure that your UMIs are in the read, and not in read 2? Why isn't the software 10xGenomics makes appropriate for what you are doing?

ADD REPLYlink written 22 months ago by swbarnes28.6k

Did you find a solution for this? I have also generated R2.fastq files tagged with the cell-barcode and UMI (using UMI tools) and mapped the reads (with STAR). The resulting bam files contain the cell barcode in the alignments and I would like to split the alignments for the different cells to perform variant calling. Thanks!

ADD REPLYlink written 3 months ago by mmalumbresm0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour