I've used GATK to call variants from RNA-sequencing data (following the guidelines here), which gives me a vcf file for each sample. I have then converted each vcf to plink format:
./plink2 --vcf file.vcf --make-bed --out plinkfile --allow-extra-chr --chr 1-22 XY --snps-only --max-alleles 2
I'd like to now merge the plink files for each sample so I can carry out an association analysis. However, the problem is that each .bim file has different SNPs, and the overlap in SNPs between each file is 40-50%. As I have many samples to merge, I don't want to exclude non-overlapping SNPs because the number that will overlap between every single sample will be tiny. Is there any way to merge keeping all the SNPs, but coding them as missing?