Creating a cell by SNP matrix
1
0
Entering edit mode
3.4 years ago
dominicdhall ▴ 40

In a recent experiment with a few thousand barcoded cells I wanted to investigate common SNPs. I have a large .bam file consisting of all reads from all cells which passed quality control for which I have run variant calling and subsequent QC on the called variants - this was saved in calls.vcf(the filtered vcf file contains ~70k sites).

I split my large bam file into one bam file per cell and also ran variant calling on individual cells using calls.vcf as my regions file meaning I now have a large number of vcf files (one per cell) containing variant data on that cell in the specified regions. Using these vcf files I would like to construct a SNP-Cell matrix.

Is this possible using already released packages?

variant calling genotyping bcftools SNP • 1.3k views
1
Entering edit mode

Using these vcf files I would like to construct a SNP-Cell matrix.

And how should this matrix look like?

0
Entering edit mode

Honestly I am unsure! I think the nature of the variant isn't too important, only that it has a label. Then for each (barcode, SNP label) pair I would either have a 0, 1 or 2. 0 would be homozygous reference, 1 would be heterozygous and 2 would be homozygous alternative allele (sorry if these labels aren't correct - I am a mathematician on a rotation project!). I think the idea would then be to perform some sort of dimensionality reduction on the (probably very sparse) matrix, followed by some sort of clustering.

The final two steps should be very easy once I have the matrix and it should be possible to create a matrix through some clever scripting but I just wondered if there were any standardized way of doing this!

1
Entering edit mode

Using these vcf files I would like to construct a SNP-Cell matrix.

don't you want a multi-sample VCF ?

0
Entering edit mode

I did consider merging my VCF files using each cell as a separate sample. Do you know if this would allow me to perform subsequent dimensionality reduction and clustering? Or would the data have to be loaded into some sort of dataframe first? (I apologise I am very new to bioinformatics in general...)

0
Entering edit mode
2.4 years ago
niklas.lang ▴ 40

I split my large bam file into one bam file per cell

I am super happy that I came across your post this morning, Dominic!

I've been searching for a way to do this for so long - so would be glad to hear how you managed to split one BAM file into one BAM file per cell? Did you use a specific tool for that? or did you even deposited the code anywhere?

Thanks a lot!