Question: Creating a cell by SNP matrix
0
gravatar for dominicdhall
11 months ago by
dominicdhall40
dominicdhall40 wrote:

In a recent experiment with a few thousand barcoded cells I wanted to investigate common SNPs. I have a large .bam file consisting of all reads from all cells which passed quality control for which I have run variant calling and subsequent QC on the called variants - this was saved in calls.vcf(the filtered vcf file contains ~70k sites).

I split my large bam file into one bam file per cell and also ran variant calling on individual cells using calls.vcf as my regions file meaning I now have a large number of vcf files (one per cell) containing variant data on that cell in the specified regions. Using these vcf files I would like to construct a SNP-Cell matrix.

Is this possible using already released packages?

ADD COMMENTlink written 11 months ago by dominicdhall40
1

Using these vcf files I would like to construct a SNP-Cell matrix.

And how should this matrix look like?

ADD REPLYlink written 11 months ago by WouterDeCoster38k

Honestly I am unsure! I think the nature of the variant isn't too important, only that it has a label. Then for each (barcode, SNP label) pair I would either have a 0, 1 or 2. 0 would be homozygous reference, 1 would be heterozygous and 2 would be homozygous alternative allele (sorry if these labels aren't correct - I am a mathematician on a rotation project!). I think the idea would then be to perform some sort of dimensionality reduction on the (probably very sparse) matrix, followed by some sort of clustering.

The final two steps should be very easy once I have the matrix and it should be possible to create a matrix through some clever scripting but I just wondered if there were any standardized way of doing this!

ADD REPLYlink written 11 months ago by dominicdhall40
1

Using these vcf files I would like to construct a SNP-Cell matrix.

don't you want a multi-sample VCF ?

or how about using https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php to merge all your VCFs ?

ADD REPLYlink written 11 months ago by Pierre Lindenbaum119k

I did consider merging my VCF files using each cell as a separate sample. Do you know if this would allow me to perform subsequent dimensionality reduction and clustering? Or would the data have to be loaded into some sort of dataframe first? (I apologise I am very new to bioinformatics in general...)

ADD REPLYlink written 11 months ago by dominicdhall40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1782 users visited in the last hour