Hello guys,
I did run freebayes and performed some filtration on the output. - removing indels, and discardmulti-allelic sites. There were long-looking SNP sequences with multiple allelic sites before filtration. For example, ATTGC, ATGGC, TTGGC..
However, even after discarding multiallelic sites, the appearance of vcf file is still same that you can see the picture below. How can I handle it? I will convert my vcf to plink and implement PCA. Would it be problem with those sites?
Also, my another question regarding discarding multi-allelic sites. How is it decided which site is discarded or not ? Is there any way that I should follow and consider the sites when discarding them ? Or can I still go with multi-allelic site for PCA analysis?
Thanks!
what was the command line ?
it depends of the scientific question asked
Thank you for your answer, Pierre.
Not the question, but I have metagenomic data. So multi-allelic sites could make sense in my case. However, I have an error about the multi-allelic when I merge two vcf file to create PLINK format. Therefore, I always discard these sites from VCF, too.
In addition to this, can I directly implement PCA on data which contains multi-allelic sites?