Question: Help in Gwas analysis
16 months ago by
leslie.ecker0 wrote:


I'm starting gwas analysis and, actually I have quite a hard time on some points.The analysis were conduted by Affymetrix array and I have a case-control study. I have the .bed, .fam. bin, .ped, and .map files. However the files do not state which are the individuals cases and controls and also the sex is not included. I also need to remove some samples from the analysis.How can I edit these files, and which ones should I edit (ped and fam?) I've tried excel and I can not because the file is too large. The other question how can I change the Affymetrix codes to the rs name? Can the bed file be read in any way? Sorry if my questions are very basic, but really I'm a beginner in this area. Thank you!

gwas • 362 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by leslie.ecker0

maybe slightly off topic but if you are new to GWAS and Affy you may wish to take note of this

ADD REPLYlink written 16 months ago by Kevin640

Many thanks all, your tips were useful, wonderful!! Now, I have other question about this error:

A problem with line 124 in [ C:\Users\leka_\Documents\DOUTORADO\gwas\gwas\Dataset\gwas_26032019\plink_text_format\JSB_AX001toAX011_final.ped ] Expecting 6 + 2 * 920636 = 1841278 columns, but found 1455707

When I'm try to remove this individual( line 124). I can not! How I can to solve this problem? Because any analysis running not is working.

Many thanks!!

ADD REPLYlink written 16 months ago by leslie.ecker0

Your PED file is essentially not in agreement with [most likely] your MAP file.

The PED file has 6 columns for:

  1. FID
  2. IID
  3. PID
  4. MID
  5. SEX
  6. PHENO

Then, the remaining columns are genotype values, with 2 columns per genotype (so, 2 * the number of genotypes)

ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe65k
16 months ago by
Inquisitive8995170 wrote:

Hi, You can exclude samples using plink directly by mentioning the ids. Refer to this link []. This way the samples will be removed by both .ped & .map files. .bed file cannot be read as it is a binary file.

ADD COMMENTlink written 16 months ago by Inquisitive8995170

Moved this to an answer.

As per Inquisitive8995, you can remove samples within PLINK, and this is the safest place to do this unless you really have a firm understanding of how PLINK arranges data. Please spend some time looking through the documentation.

Phenotype (case/control) and gender (sex) information, if they exist, would be stored in the FAM and/or the first few columns of the PED file. Again, check the documentation to ensure that you know to which column each relates.

For mapping your Affymetrix IDs to rs IDs, you will have a difficult task. Probably the easiest way is to use the chr and base position information from your MAP file to map each to a rs ID. You can then either leave the Affymetrix IDs as they currently are and map to rs IDs when you generate your final results, or attempt to update the IDs within the PLINK objects (check the documentation).

You can download the annotation file for your array from the Affymetrix / ThermoFisher web-site and see if that contains rs ID, which can then be mapped back to probe ID.

ADD REPLYlink written 16 months ago by Kevin Blighe65k

Many thanks for help. The phenotype,gender and information exist in FAM file but I need update them. How can update this columms im FAM or PED files? tks

ADD REPLYlink written 16 months ago by leslie.ecker0

Hey, remotely, it is just very difficult to help you with this without seeing virtually all of the commands that you have used. Evidently, from your other comment (at top), your PED and MAP files are not in agreement. If you want to take me through each step that you have performed (and paste code, if possible), it may help.

ADD REPLYlink written 16 months ago by Kevin Blighe65k

I really need help here, the initially generated files are not correct data of phenotype and gender (both were marked with 0 or -9) and now I need to insert / modify this data correctly, the sex and the phenotype of each sample. I do not understand how to do this. Would you help me? Thank you

ADD REPLYlink written 16 months ago by leslie.ecker0

If there are issues with the source data, then, may I suggest that you go back to whoever produced it and let them know. Helping you with this issue is simply not easy over a discussion forum like Biostars.

The best that I can feasibly do is to advise you to familiarise yourself with the structure of Plink objects.

ADD REPLYlink written 16 months ago by Kevin Blighe65k
