Hi! After about 2 full days of research and reading so many papers, I am still super stuck on this question:
What site filters do I need to use on my vcf file to prepare it for imputing?
Some details:
- Data consists of 84 individuals of an inbred bird species. After variant calling and before any filtering, 72 individuals have an average depth of around 12x but 12 individuals have an average depth of <5x. I am hoping to impute just these <5x individuals, using the rest as a reference panel?
- I will only be imputing a few contigs that I need for haplotyping.
- Imputation software is possibly QUILT or STITCH (w/ or w/o a reference panel) - I am undecided and was going to try all 3
For haplotyping of the higher coverage (>5x) individuals I applied a few filters: (cutoff) min GQ 10, MAF<0.05, min depth <5x, max missingness 0.1, strand bias adjusted phred score 60, max depth <200. I have no idea if I am meant to apply the same filters to all of my individuals before imputing or not! So many papers don't mention any pre-imputation filtering, some have vague mentions, and I am just confused because without filtering, I thought a lot of my data would be pretty poor for making any haplotype (or imputation) judgements.
Should I be filtering my sites so that poor sites are missing, and only good sites remain for imputation? Or do I need to retain as much info as possible, and perhaps filter after imputation? I am so lost so any guidance is greatly appreciated! Thank you!!