Question

Closed:Help Interpreting PED File

0

Entering edit mode

7.7 years ago

Steve • 0

Hi, I'm new to bioinformatics (so sorry if this is a silly question) and I could use some help interpreting a PED file.

The columns of the file are:

Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Phenotype, rs6874105, rs7191668, rs11541311, etc...

For my data the calls in the SNP/rsID columns appear like: C A, G G, A G, etc...

I don't understand why for the call there are two nucleotides listed for each SNP (e.g. C A). How do I know what the genotype of the sample is? i.e. is the genotype C/G or A/T at this locus?

Ultimately, what I would like to do is a coarse gene level analysis between diseased patients and non-diseased patients to see if a gene is associated with disease. I would call a gene "mutated" if it contains SNP genotypes that defer from reference. I would then compare the number of healthy people with the "mutated" gene compared to the number of diseased people with the "mutated" gene. I would like to do it this way because my sample size is very small and I don't have enough power to do an association analysis at the SNP level and then I could go back to the SNPs once I know which gene may be associated with disease.

Thank you so much, I appreciate any suggestions!

SNP snp PED ped file biallelic • 303 views

ADD COMMENT • link 7.7 years ago by Steve • 0