Entering edit mode
4.0 years ago
nick.gold
•
0
Hi everyone,
I am attempting to recreate the the quality control analysis performed in the 1000 genomes project (http://tcag.ca/documents/tools/omni25_qcReport.pdf).
I am fairly new to performing QC on a dataset, and am currently stuck on section 5.1 of the analysis. The 1000 genomes data was assembled with 2 different sets of markers, and in the analysis only the overlapping 2,150,028 SNPs in both datasets were used.
Does anyone know how I can remove any of the non-overlapping SNPs? Could this be done in PLINK, and should it be performed on the .vcf file?
have a look at
bcftools isec