Question

Change sample ID in BAM file to cell barcode

0

Entering edit mode

15 months ago

martin.grasshoff • 0

Hi all.

I have a BAM file from one 10X scRNAseq sample.

I want to try a tool out which was designed for a different type of data.

The input for this tool is the output from samtools mpileup. samtools of course detects only one sample in the bam file.

I can split the BAM file into 1 file per cell using subset-bam from 10X. This will take very long and obviously create a lot of files, but it solves the problem.

I thought that it might be better, if I change the header and the ID tags to refer to the cell barcodes. Then samtools can directly split the reads according to the cell barcodes.

Here I run into problems, since I don't know how to edit the header and the ID tags for the reads.

Any help would be much appreciated and Thanks a lot in advance.

samtools bam scRNA • 1.2k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 15 months ago by martin.grasshoff • 0

1

Entering edit mode

https://github.com/10XGenomics/subset-bam will split the BAM into cell barcode specific subsets. You can then use samtools mpileup on the subset BAM.

I thought that it might be better, if I change the header and the ID tags to refer to the cell barcodes. Then samtools can directly split the reads according to the cell barcodes.

How would this work? samtools is not splitting the data based on fastq headers.

ADD REPLY • link 15 months ago by GenoMax 141k

0

Entering edit mode

Then I must have misunderstood the documentation. Since samtools reports the number of detected samples, I assumed that I can modify the BAM file.

Thanks for clarifying.

ADD REPLY • link 15 months ago by martin.grasshoff • 0

1

Entering edit mode

Are you querying the cellbarcodes using the CB tags with samtools view when you refer to "samples" or something else? Technically if you had the original fastq files (with CB+UMI) then you could do something with those in terms of "splitting" the reads but using the tool above is likely easiest.

ADD REPLY • link 15 months ago by GenoMax 141k

0

Entering edit mode

Yes, I mean the cell barcodes.

Also, subset-bam does seem to be the easiest option, but it also seems to be the brute force solution. Having something else seemed better in terms of time and disk space.

But it would probably take more time to find a solution using the fastq files, then just splitting the bam files.

ADD REPLY • link 15 months ago by martin.grasshoff • 0