Hi all.
I have a BAM file from one 10X scRNAseq sample.
I want to try a tool out which was designed for a different type of data.
The input for this tool is the output from samtools mpileup. samtools of course detects only one sample in the bam file.
I can split the BAM file into 1 file per cell using subset-bam from 10X. This will take very long and obviously create a lot of files, but it solves the problem.
I thought that it might be better, if I change the header and the ID tags to refer to the cell barcodes. Then samtools can directly split the reads according to the cell barcodes.
Here I run into problems, since I don't know how to edit the header and the ID tags for the reads.
Any help would be much appreciated and Thanks a lot in advance.
https://github.com/10XGenomics/subset-bam will split the BAM into cell barcode specific subsets. You can then use
samtools mpileup
on the subset BAM.How would this work?
samtools
is not splitting the data based on fastq headers.Then I must have misunderstood the documentation. Since samtools reports the number of detected samples, I assumed that I can modify the BAM file.
Thanks for clarifying.
Are you querying the cellbarcodes using the
CB
tags withsamtools view
when you refer to "samples" or something else? Technically if you had the original fastq files (with CB+UMI) then you could do something with those in terms of "splitting" the reads but using the tool above is likely easiest.Yes, I mean the cell barcodes.
Also, subset-bam does seem to be the easiest option, but it also seems to be the brute force solution. Having something else seemed better in terms of time and disk space.
But it would probably take more time to find a solution using the fastq files, then just splitting the bam files.