I have a few genes of interest from a de novo RNA-seq library assembly that I would like to phase (variant call). I don't have a genome to use.
For one of my genes (~10,000bp long), I did a pairwise analysis (Blast) and I get ~100 IDs for SNPs when I filter for rank values higher than 0.2 and only look at hits of E values of 0 (I used 3 individuals). My goal is to get two alleles per individual.
I have some random questions: - Does this mean that I have ~100 SNPs? If this is the case, what approach is suggested to identifying snps that go with a transcriptome reference (if a pairwise analysis is the way to go, what E value cut-off is suggested?)? - Without a genome, can discosnp++ tell me which SNPs go together in the same allele? - If I want to allow for more than 2 SNPs to be reported per site, I just have to enter a value for -P higher than 2, right?
Any tips would be welcomed.