Hi, I need to generate a personal genomics report based on some rsid (mainly SNPs), starting from single sample VCFs. Till now, I manually search for the genotype (0/0, for example) of every rsid,I copy the value in an excel file, I generate the corresponded genotype (AA, for example) and i trigger some rules to generate some results (for example: if genotype is AA: low risk, if genotype is AC increased risk...). Now, I want otop create some scripts to query the VCF for a list of rsid, have the correspondent genotypes, do some calculations like in excel, then generate an output file that I can switch to a readable report.
I seee there are solutions like scikit + numpy or pèandas, or similar, or, to convert the VCF in Parquet and then use cloud solutions in Google Cloud, Amazon, or Azure...but I not know if this is the better approach....
Do you have some ideas? Thanks!
Thank you Pierre.
So, I have to create a text file with the list of RSID I want to query (filelistofrsid.txt), right? Then bcftools will search for the correspondent genotypes in my VCF (input.vcf), right? And the output would be what kind of file? Thanks!