Dear Biostars Community,
I am trying to identify SNP/indels in my WGS samples and faced an issue (I am just a newbie in bioinformatics). I have sequenced yeast genomes (haploid), trimmed the reads, aligned to the reference genome, etc and then I used samtools mpileup command to call for SNP/indels:
samtools mpileup -uf REFERENCE/S288C_genome.fsa OUTPUT/", file, "_sorted.bam | bcftools view -bvcg - > OUTPUT/", file, "_var.raw.bcf"
and then write the results with the list of SNP/indels in a file
The problem is that with that script I am getting a list of 1500 SNP/indels that contain variations in the reads due to sequencing/amplification errors. How can I narrow down the list of SNPs to detect ONLY those that would reflect a real change in the haploid genome?
happy to hear your suggestions!
Not an answer to your question, but using GATK HaplotypeCaller you can set the expected ploidy for variant calling, which might be more accurate.