Hi, I'm new to bioinformatics (so sorry if this is a silly question) and I could use some help interpreting a PED file.
The columns of the file are:
Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Phenotype, rs6874105, rs7191668, rs11541311, etc...
For my data the calls in the SNP/rsID columns appear like: C A, G G, A G, etc...
I don't understand why for the call there are two nucleotides listed for each SNP (e.g. C A). How do I know what the genotype of the sample is? i.e. is the genotype C/G or A/T at this locus?
Ultimately, what I would like to do is a coarse gene level analysis between diseased patients and non-diseased patients to see if a gene is associated with disease. I would call a gene "mutated" if it contains SNP genotypes that defer from reference. I would then compare the number of healthy people with the "mutated" gene compared to the number of diseased people with the "mutated" gene. I would like to do it this way because my sample size is very small and I don't have enough power to do an association analysis at the SNP level and then I could go back to the SNPs once I know which gene may be associated with disease.
Thank you so much, I appreciate any suggestions!