Hello,
I'm fairly new to the PLINK scene and I am trying to run some analyses on data from the Cancer Genome Atlas (TCGA). I have one VCF file with IDs, genotypes, filtering etc. and another from a different source with the phenotypes I am investigating.
However, the phenotype file (.txt) has shortened IDs: TCGA-xxxx-xx while the VCF file has IDs: TCGA-xxxx-xxx-xxxx. The first three fields are the same, but I'm not sure how to get PLINK to find non-exact matches or edit the VCF to still be valid but have shortened IDs (the file has missing fields so fails VCFtools).
Any help with this would be greatly appreciated!