Question: Plink genotype data (based on SNP) format!
0
gravatar for kyc92dream
21 months ago by
kyc92dream0
kyc92dream0 wrote:

Hi there,

My genotype data only has 2 types and designed as 0 and 1 respectively (they are not in pairs format only give single number, and the third is ? represents missing), and i have no idea how could i arrange my ped file format as it requires 2 columns for each SNP. I tried when i set one column for each SNP, it indicated that the found columns less than expected. Is there anyone who can help me with this so i could arrange my genotype data to fit running format?

snp • 1.4k views
ADD COMMENTlink modified 21 months ago by MoppelKopp0 • written 21 months ago by kyc92dream0

Could you provide an example of your data?

ADD REPLYlink written 21 months ago by GabrielMontenegro520

0 H1 0 0 0 0.9 1 0 0 0 1 1 1 1 0

0 H2 0 0 0 2.3 0 0 0 0 1 1 1 0 0

0 H3 0 0 0 1.1 0 0 1 0 1 1 1 0 0

0 H4 0 0 0 1.1 0 0 0 0 1 1 1 0 0

0 H5 0 0 0 0.4 0 0 0 0 1 1 1 0 0

0 H6 0 0 0 1.1 0 0 0 0 1 0 1 0 0

I am doing quantitive trait study in crop, no family ID, parental ID and sex where i typed 0 under corresponding columns. SNP-based genotype data starts from 7th column(i only pasted 6 individuals and 9 SNPs here), one SNP takes one column.

ADD REPLYlink modified 21 months ago • written 21 months ago by kyc92dream0

And i tried to add flag compound-genotype, but it did not work as it should be 2 characters long.

ADD REPLYlink written 21 months ago by kyc92dream0
0
gravatar for MoppelKopp
21 months ago by
MoppelKopp0
MoppelKopp0 wrote:

You need an additional Map file. With that, plink should be able to convert your files automatically.

Example from Plink's own "toy" data:

(without header) CHR SNP-ID cM_Position(set to 0) BP_position
1 rs0 0 1000
1 rs10 0 1001

just create this file with a line for each of your markers and without the header, and it should work.

plink command example:
"plink --file INPUT --make-bed --out OUTFILE"

Plink will create a "binary" set of three files:

  • .fam - phenotype information for each sample
  • .bed - binary genotype file
  • .bim - map file

The map file will contain additional columns for your genotype alleles, coded with 0/1/2. You may later change that if you like, see "https://www.cog-genomics.org/plink/1.9/data#update_map"

Some additional tips:

  • If you don't have a family ID, use the sample ID for the family ID as well. For now, al your samples belong to family "0"
  • Use Plink 2.0, it is much faster. Keep Plink 1.07 as a backup copy somewhere, some features of Plink 1.07 are not implemented in Plink 2 yet
ADD COMMENTlink modified 21 months ago • written 21 months ago by MoppelKopp0

I forgot something: You have to change the missing identifier (plink standard is "-9") or add the option "--missing-code ?" - but I am not sure, if plink is able to process the "?".

ADD REPLYlink written 21 months ago by MoppelKopp0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1926 users visited in the last hour