Identifying if an individual has a SNP in a particular allele.
1
0
Entering edit mode
9.9 years ago
Tom ▴ 40

Could someone help me please.

I have a gene of interest. In that gene, I have: SNP information for two loci (SNPa and SNPb) in that gene for 10 people, and the age of diagnosis of a disease of the individual.

What I want is to have a table with four columns: a person ID, age of disease diagnosis, do they have SNPa, do they have SNPb for this gene.

(after this step, I want to look at whether the people with SNPa tend to live longer with the disease than those with SNPb, using the Cox model).

I'm new to PLINK.

I know that the .bed file is a binary file with the genotypes of my individuals. I know the names of the SNPs (ss names) that I want to know whether the individuals have or not.

Would someone know the command for PLINK that says "go to the relevent file (probably the .bed file), return a list of individuals with SNPa (i.e. individual's ID, and that they have a SNP at this point). Then I can run the same command for SNPb, and then make the table described above based on whether the SNP is present.

Thanks

Aoife

plink SNP • 3.8k views
ADD COMMENT
0
Entering edit mode

Thank you very much. I really appreciate it.

ADD REPLY
0
Entering edit mode

This is not an answer, please use ADD COMMENT.

ADD REPLY
2
Entering edit mode
9.9 years ago
zx8754 11k

You will need to use --snps to extract SNPs and --keep to extract individuals. e.g.:

plink --bfile mybedfile --snps SNPa,SNPb --keep indfile.txt --recode --out mybedfilesmall
ADD COMMENT
0
Entering edit mode

Could I ask two further questions.

I was trying to alter the command so that the --snps was a file with a list of SNPs in it rather than comment delimited on the command line, but I was unsuccessful. The SNPs have no particular range (because I provided a simple example of something I'm doing on a much more complicated scale), I'll have too many SNPs to type manually on the command line, and the SNPs won't be confined to a particular chromosome, range etc.

My second question is about the --keep command. So I have a list of say 1,000 individuals in total in my population, and I have genotypic information for all of these individuals.

I read that the indfile.txt should describe a list of the individuals that I want to keep. But I want to keep "all of whom have the SNPs as described in the --snps file". So I can't pre-define the indfile.txt with a list of individuals, because I don't know which ones have the SNPs.

So what I want is ultimately:

I read in a .bim/.bed file for the full list of participants and their genotypes.

I read in a list of SNPs separately.

The command will say:

For each SNP, go through the genotypic data of each individual.

Pull out something like this:

SNP1: Person 1,2,3,4,5 have this SNP. -> print out a file called SNP1. In this file, list 1,2,3,4,5.

SNP2:Person 1,3,5,6,7 has this SNP -> print out a file called SNP2. In this file, list 1,3,5,6,7.

etc. So it gives me, for each SNP, a list of individuals with the SNP.

If you had any ideas, I would appreciate it.

ADD REPLY
0
Entering edit mode

Use --extract mysnplistfile.txt to read list of SNPs from a file, one SNP per line.

all of whom have the SNPs as described in the --snps file

regarding this part: plink will have all the genotype for all individuals in one file .bed. Individuals with missing genotypes will be usually coded as 0 0. I suggest subset using: --extract --keep, then you will need a bit of coding...

ADD REPLY
0
Entering edit mode

Ok thanks. Yes so I ran plink --bfile X --extract SNPList --recode --out SMALL --noweb and then I get a list of (for example), 5 SNPs in the .map file, and the .ped file looks like this, each individual's genotype at the 5 points.

X   C C    C C    T T    C C    C C
Y   C C    T C    T C    C C    0 0
Z   C C    T C    T T    0 0    C C

So then, for the coding, would it make sense that I can match what the allele in this genotype file is, to the allele in a file that I have the information about each SNP in:

SNPA,TCCCAG[C/T]AAGATTTGAGAAA
___________^^^^^

So if the alleles for an individual at this point at C or T, they are not a SNP. If the alleles are A and G, they are a SNP. If the alleles are 0 and 0, the genotype is missing and I can't tell.

Many thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6