Hello, does anyone by any chance know of a fast/computationally efficient way to select lines in a .dosage file if the first column's SNP ID is also contained within a .txt document of SNP IDs?
The .dosage file is in the following format:
SNPID Position REF ALT Sample1Dosage Sample2Dosage Sample3Dosage . . . 1:100:A:C A C 0 2 1 . . . 1:101:C:T C T 1 2 1 . . . . . .
The list of SNP IDs in a .txt document is in the following format:
1:100:A:C 1:101:C:T 1:103:G:A 1:105:C:T
. . .
I have tried using grep -f snp_IDs.txt example.dosage > filtered_example.dosage, but the command is unfortunately too slow for my server to run it without hitting the max wall time