Question: Creating a cell by SNP matrix
0
gravatar for dominicdhall
16 months ago by
dominicdhall40
dominicdhall40 wrote:

In a recent experiment with a few thousand barcoded cells I wanted to investigate common SNPs. I have a large .bam file consisting of all reads from all cells which passed quality control for which I have run variant calling and subsequent QC on the called variants - this was saved in calls.vcf(the filtered vcf file contains ~70k sites).

I split my large bam file into one bam file per cell and also ran variant calling on individual cells using calls.vcf as my regions file meaning I now have a large number of vcf files (one per cell) containing variant data on that cell in the specified regions. Using these vcf files I would like to construct a SNP-Cell matrix.

Is this possible using already released packages?

ADD COMMENTlink modified 4 months ago by niklas.lang0 • written 16 months ago by dominicdhall40
1

Using these vcf files I would like to construct a SNP-Cell matrix.

And how should this matrix look like?

ADD REPLYlink written 16 months ago by WouterDeCoster41k

Honestly I am unsure! I think the nature of the variant isn't too important, only that it has a label. Then for each (barcode, SNP label) pair I would either have a 0, 1 or 2. 0 would be homozygous reference, 1 would be heterozygous and 2 would be homozygous alternative allele (sorry if these labels aren't correct - I am a mathematician on a rotation project!). I think the idea would then be to perform some sort of dimensionality reduction on the (probably very sparse) matrix, followed by some sort of clustering.

The final two steps should be very easy once I have the matrix and it should be possible to create a matrix through some clever scripting but I just wondered if there were any standardized way of doing this!

ADD REPLYlink written 16 months ago by dominicdhall40
1

Using these vcf files I would like to construct a SNP-Cell matrix.

don't you want a multi-sample VCF ?

or how about using https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php to merge all your VCFs ?

ADD REPLYlink written 16 months ago by Pierre Lindenbaum123k

I did consider merging my VCF files using each cell as a separate sample. Do you know if this would allow me to perform subsequent dimensionality reduction and clustering? Or would the data have to be loaded into some sort of dataframe first? (I apologise I am very new to bioinformatics in general...)

ADD REPLYlink written 16 months ago by dominicdhall40
0
gravatar for niklas.lang
4 months ago by
niklas.lang0 wrote:

I split my large bam file into one bam file per cell

I am super happy that I came across your post this morning, Dominic!

I've been searching for a way to do this for so long - so would be glad to hear how you managed to split one BAM file into one BAM file per cell? Did you use a specific tool for that? or did you even deposited the code anywhere?

Thanks a lot!

ADD COMMENTlink modified 4 months ago • written 4 months ago by niklas.lang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour