I have a list of variants called from a individual genome and I'm trying to filter out the important predispositions from it. My approach was to download the
variant_summary.txt.gz file from ClinVar website, in which most of the variants related to human health are being recorded, so that I can intersect my variants with it.
I loaded the
variant_summary.txt into R and it says the Dataset has 154358 rows and 25 columns. But when I check with
wc-l linux the number of records is 198661. I double checked the no of rows by visualizing the data in excel. It had 198,661. My questions are,
- Why R does not load all the records of my file?
- Given the fact that I'm still novice to bioinformatics do you think that my approach is feasible in finding predispositions if I fix the R issue?
Thanks you very much.