I am working on preprocessing data from a list of .vcf.gz to subset all these .gz files according to a list of SNPs. I stored SNP IDs of interest into a text file. And I want to all rows from these .vcf.gz files that have the same SNP IDs from the SNP_ID file:
rs61733845 rs1320571 rs9729550 rs1815606 rs7515488 rs11260562 rs6697886 rs6603785 rs11804831
In python I would imagine to process each line on conditional statement or inner join, yet python may not be an optimal choice since the size all these .vcf.gz files are huge. Is there any way I can subsetting vcf.gz based on a text file with bash command such as awk, sed, or cat? Thanks!