I have to do a snp calling on 40 or so samples a few of which originate from public sources. For these the raw data is not available in all cases.
Therefore I thought of building a dummy fastq paired dataset by chopping the reference into pieces using a window approach to add some coverage.
Any thoughts on this?
I would remove all monomorphic calls for this sample, and apply default filters like snps in repeat regions and near-indel-snps.
An alternative would be to compare the reference on which mapping will be done with this reference using Mummer. But then I would have to integrate the calls into the vcf and snp calling metrics would be absent for this sample.
Neither of the two I like very much but I don't see an alternative really.
Thanks for any suggestion.