Are there any tools for variant calling from single cell sequencing data? The question includes SNPs, Somatic mutations, CNVs, SVs. I thought the data will be similar once generated and processed to BAM files, but it might be necessary to consider systematic bias from the single cell sequencing processes (e.g. MDA).
I'm currently working on a method to do so, based on freebayes. Please bear with me.
The good news is that you can probably get a large part of the way there already with freebayes and a single option. Adding --pooled-discrete will remove any assumptions about genotype frequencies that might bias your results away from the patterns expected in a clonally-evolving population.
If you are concerned about non-independence of reads due to the limited (max 2x!) input to the amplification, you can also adjust the --read-dependence-factor. Set it lower (e.g. 0.8? 0.7?) to approximate less independence between reads. I don't know how effective it will prove to be in this context, and I'd like to implement a better solution to this problem by directly estimating the parameter form the sequence data. Still, this should provide a basic method to correct for assumptions of independence. It's already used by default in practice (@ 0.9). If you're interested in this issue please see this paper on the topic: The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process.
If you include your germline in the analysis, you can use a tools in vcflib (vcfsamplediff) to tag putative somatic variants and add a somatic score (SSC) provided you have genotype qualities (--genotype-qualities in freebayes).
Feel free to contact me by email to further discuss. I'm curious what you come up with.