Identifying if an individual has a SNP in a particular allele.
1
0
Entering edit mode
6.9 years ago
Tom ▴ 40

I have a gene of interest. In that gene, I have: SNP information for two loci (SNPa and SNPb) in that gene for 10 people, and the age of diagnosis of a disease of the individual.

What I want is to have a table with four columns: a person ID, age of disease diagnosis, do they have SNPa, do they have SNPb for this gene.

(after this step, I want to look at whether the people with SNPa tend to live longer with the disease than those with SNPb, using the Cox model).

I know that the .bed file is a binary file with the genotypes of my individuals. I know the names of the SNPs (ss names) that I want to know whether the individuals have or not.

Would someone know the command for PLINK that says "go to the relevent file (probably the .bed file), return a list of individuals with SNPa (i.e. individual's ID, and that they have a SNP at this point). Then I can run the same command for SNPb, and then make the table described above based on whether the SNP is present.

Thanks

Aoife

0
Entering edit mode

Thank you very much. I really appreciate it.

0
Entering edit mode

2
Entering edit mode
6.9 years ago
zx8754 10k

You will need to use --snps to extract SNPs and --keep to extract individuals. e.g.:

plink --bfile mybedfile --snps SNPa,SNPb --keep indfile.txt --recode --out mybedfilesmall

0
Entering edit mode

Could I ask two further questions.

I was trying to alter the command so that the --snps was a file with a list of SNPs in it rather than comment delimited on the command line, but I was unsuccessful. The SNPs have no particular range (because I provided a simple example of something I'm doing on a much more complicated scale), I'll have too many SNPs to type manually on the command line, and the SNPs won't be confined to a particular chromosome, range etc.

My second question is about the --keep command. So I have a list of say 1,000 individuals in total in my population, and I have genotypic information for all of these individuals.

I read that the indfile.txt should describe a list of the individuals that I want to keep. But I want to keep "all of whom have the SNPs as described in the --snps file". So I can't pre-define the indfile.txt with a list of individuals, because I don't know which ones have the SNPs.

So what I want is ultimately:

I read in a .bim/.bed file for the full list of participants and their genotypes.

I read in a list of SNPs separately.

The command will say:

For each SNP, go through the genotypic data of each individual.

Pull out something like this:

SNP1: Person 1,2,3,4,5 have this SNP. -> print out a file called SNP1. In this file, list 1,2,3,4,5.

SNP2:Person 1,3,5,6,7 has this SNP -> print out a file called SNP2. In this file, list 1,3,5,6,7.

etc. So it gives me, for each SNP, a list of individuals with the SNP.

If you had any ideas, I would appreciate it.

0
Entering edit mode

Use --extract mysnplistfile.txt to read list of SNPs from a file, one SNP per line.

all of whom have the SNPs as described in the --snps file

regarding this part: plink will have all the genotype for all individuals in one file .bed. Individuals with missing genotypes will be usually coded as 0 0. I suggest subset using: --extract --keep, then you will need a bit of coding...

0
Entering edit mode

Ok thanks. Yes so I ran plink --bfile X --extract SNPList --recode --out SMALL --noweb and then I get a list of (for example), 5 SNPs in the .map file, and the .ped file looks like this, each individual's genotype at the 5 points.

X   C C    C C    T T    C C    C C
Y   C C    T C    T C    C C    0 0
Z   C C    T C    T T    0 0    C C


So then, for the coding, would it make sense that I can match what the allele in this genotype file is, to the allele in a file that I have the information about each SNP in:

SNPA,TCCCAG[C/T]AAGATTTGAGAAA
___________^^^^^


So if the alleles for an individual at this point at C or T, they are not a SNP. If the alleles are A and G, they are a SNP. If the alleles are 0 and 0, the genotype is missing and I can't tell.

Many thanks.