I have illumina short reads which I want to call SNP without using a reference genome. I want to make a consenses sequence of some selected reads and then align the other reads to it. so the basic idea is that the consensus sequence need to be act like a "dummy genome" Is there any way to do this?
I think SNP is at the base position that the base in one genome (ex: genome A) is different from the base in the other genome (ex: genome B).
From your question, it seems that the reads derived from genome A and genome B are mixed in the data, and
genome A = sequences produced by some selected reads;
genome B = sequences produced by other reads.
I wonder how you extract reads of genome A from the original data, but if you can do that, I agree with toralmanvar.
You should assemble the reads derived from genome A, and you will get contigs that you called “dummy genome”.
Then you align the reads derived from genome B to the contigs of genome A, and you will detect SNPs.
I recommend that you also align the reads derived from genome A to the contigs of genome A at that time, to compare the allele frequency of two genomes.