Hi,
I have 150 bp paired-end illumina reads for 120 haploid isolates derived from a bi-parental cross. The genome sizes of the two parents are ~50 mb, and i have, on average, > 100x coverage for the 120 progeny isolates. I want to use this data to identify SNPs in the population, which will then be used to create genetic maps and perform QTL analysis.
I want to know if there are any best practices for calling SNPs from a biparental population, or if there are any bioinformatic pipelines that would be ideal for this. I have already tried using bcftools mpileup and bcftools call:
bcftools mpileup -Ou -f genomic.fasta *.bam | bcftools call -mv -Ob --ploidy 1 --threads 4 -o calls1.bcf
and while I was able to generate and map markers, I'm not sure how much confidence I should have in the data. I found that even after filtering for metrics such as depth, quality, DP4, MQ, SCBZ, etc. I still have many calls that indicate double-recombination events in single progeny, or double recombination events in multiple progeny but only favoring the reference allele. I can minimize these by using more and more stringent filters, but I fear that I am eliminating "good" markers in the process. Also, many of the metrics in my VCF files that I can use for filtering appear to be based on a natural population- probably more geared towards use in GWAS.
Any recommendations or advice?