Question

split BAM file with same cell-barcode and UMI pair

0

Entering edit mode

5.5 years ago

newbinf • 0

I have single cell RNA-seq reads (from 10x Chromium) that have already been pre-processed. The cell-bacode and UMI tag were cut-and-pasted to the header (via umi-tools) and low quality reads were removed. Next, I mapped the reads (with STAR) and isolated the reads from a gene of interest (via samtools). At the end of the day I want to genotype each cell for a specific gene, using the cell-barcode and UMI pair, and call variants.

How do I split the BAM file into separate BAM files based on the cell-barcode and UMI pair? In other words, I want a bam file of aligned reads with the same cell-barcode and UMI pair.

Thank you!

RNA-Seq unique molecular identifier umi-tools • 4.6k views

ADD COMMENT • link 5.5 years ago by newbinf • 0

0

Entering edit mode

You really want your data split into hundreds of thousands of files?

ADD REPLY • link 5.5 years ago by swbarnes2 14k

0

Entering edit mode

There are only about 50 cell barcodes in my gene/region of interest. So it would be about 50-70 files.

I should clarify I used samtools to only grab the portion of the bam file with alignments to one gene.

ADD REPLY • link 5.5 years ago by newbinf • 0

0

Entering edit mode

Okay, so 50 cell barcodes times, 20 UMIs per sample? A thousand files, are you sure this is helpful? 10xgenomics software will tag every read with cell barcode and gene, why can't you make use of that?

ADD REPLY • link 5.5 years ago by swbarnes2 14k

0

Entering edit mode

Sorry for the late reply! I am using the 10x cell barcode tags, they are now placed in the read headers. I cut out the tags because I do not want misalignments caused by the cell barcode and UMI tags.

ADD REPLY • link 5.5 years ago by newbinf • 0

0

Entering edit mode

Are you sure that your UMIs are in the read, and not in read 2? Why isn't the software 10xGenomics makes appropriate for what you are doing?

ADD REPLY • link 5.5 years ago by swbarnes2 14k

0

Entering edit mode

Did you find a solution for this? I have also generated R2.fastq files tagged with the cell-barcode and UMI (using UMI tools) and mapped the reads (with STAR). The resulting bam files contain the cell barcode in the alignments and I would like to split the alignments for the different cells to perform variant calling. Thanks!

ADD REPLY • link 3.8 years ago by mmalumbresm • 0