Error attaching phenotype in PLINK (fewer tokens than expected)
0
1
Entering edit mode
3.2 years ago
absoldini ▴ 10

I'm 100% new to Bioinformatics and terrible with computers (medical doctor). 'Currently working on a GWAS with the data from the Human Connectome Project (HCP). Running into some issues, please bear with me if my description of the issue isn*t optimal

Using PLINK

Already have the Phenotypes from the HCP webpage in .csv format

This is how my .fam file looks like

52259_82122 100004 52259 82122 1 -9
56037_85858 100206 56037 85858 1 -9
51488_81352 100307 51488 81352 2 -9
51730_81594 100408 51730 81594 1 -9
52813_82634 100610 52813 82634 1 -9
51283_52850_81149 101006 51283 81149 2 -9
51969_81833 101107 51969 81833 1 -9
51330_81195 101208 51330 81195 2 -9
52385_82248 101309 52385 82248 1 -9
52198_82061 101410 52198 82061 1 -9

This is how my phenotype .cvs file looks like

Subject,Age_in_Yrs,HasGT,ZygositySR,ZygosityGT,Family_ID,Mother_ID,Father_ID,TestRetestInterval,Race,Ethnicity,Handedness,SSAGA_Employ,SSAGA_Income,SSAGA_Educ,SSAGA_InSchool,SSAGA_Rlshp,SSAGA_MOBorn,Height,Weight,BMI,SSAGA_BMICat,SSAGA_BMICatHeaviest,Blood_Drawn,Hematocrit_1,Hematocrit_2,BPSystolic,BPDiastolic,ThyroidHormone,HbA1C,Hypothyroidism,Hypothyroidism_Onset,Hyperthyroidism,Hyperthyroidism_Onset,OtherEndocrn_Prob,OtherEndocrine_ProbOnset,Menstrual_RegCycles,Menstrual_Explain,Menstrual_AgeBegan,Menstrual_CycleLength,Menstrual_DaysSinceLast,Menstrual_AgeIrreg,Menstrual_AgeStop,Menstrual_MonthsSinceStop,Menstrual_UsingBirthControl,Menstrual_BirthControlCode,FamHist_Moth_Scz,FamHist_Fath_Scz,FamHist_Moth_Dep,FamHist_Fath_Dep,FamHist_Moth_BP,FamHist_Fath_BP,FamHist_Moth_Anx,FamHist_Fath_Anx,FamHist_Moth_DrgAlc,FamHist_Fath_DrgAlc,FamHist_Moth_Alz,FamHist_Fath_Alz,FamHist_Moth_PD,FamHist_Fath_PD,FamHist_Moth_TS,FamHist_Fath_TS,FamHist_Moth_None,FamHist_Fath_None,ASR_Anxd_Raw,ASR_Anxd_Pct,ASR_Witd_Raw,ASR_Witd_T,ASR_Soma_Raw,ASR_Soma_T,ASR_Thot_Raw,ASR_Thot_T,ASR_Attn_Raw,ASR_Attn_T,ASR_Aggr_Raw,ASR_Aggr_T,ASR_Rule_Raw,ASR_Rule_T,ASR_Intr_Raw,ASR_Intr_T,ASR_Oth_Raw,ASR_Crit_Raw,ASR_Intn_Raw,ASR_Intn_T,ASR_Extn_Raw,ASR_Extn_T,ASR_TAO_Sum,ASR_Totp_Raw,ASR_Totp_T,DSM_Depr_Raw,DSM_Depr_T,DSM_Anxi_Raw,DSM_Anxi_T,DSM_Somp_Raw,DSM_Somp_T,DSM_Avoid_Raw,DSM_Avoid_T,DSM_Adh_Raw,DSM_Adh_T,DSM_Inat_Raw,DSM_Hype_Raw,DSM_Antis_Raw,DSM_Antis_T,SSAGA_ChildhoodConduct,SSAGA_PanicDisorder,SSAGA_Agoraphobia,SSAGA_Depressive_Ep,SSAGA_Depressive_Sx,Color_Vision,Eye,EVA_Num,EVA_Denom,Correction,Breathalyzer_Over_05,Breathalyzer_Over_08,Cocaine,THC,Opiates,Amphetamines,MethAmphetamine,Oxycontin,Total_Drinks_7days,Num_Days_Drank_7days,Avg_Weekday_Drinks_7days,Avg_Weekend_Drinks_7days,Total_Beer_Wine_Cooler_7days,Avg_Weekday_Beer_Wine_Cooler_7days,Avg_Weekend_Beer_Wine_Cooler_7days,Total_Malt_Liquor_7days,Avg_Weekday_Malt_Liquor_7days,Avg_Weekend_Malt_Liquor_7days,Total_Wine_7days,Avg_Weekday_Wine_7days,Avg_Weekend_Wine_7days,Total_Hard_Liquor_7days,Avg_Weekday_Hard_Liquor_7days,Avg_Weekend_Hard_Liquor_7days,Total_Other_Alc_7days,Avg_Weekday_Other_Alc_7days,Avg_Weekend_Other_Alc_7days,SSAGA_Alc_D4_Dp_Sx,SSAGA_Alc_D4_Ab_Dx,SSAGA_Alc_D4_Ab_Sx,SSAGA_Alc_D4_Dp_Dx,SSAGA_Alc_12_Drinks_Per_Day,SSAGA_Alc_12_Frq,SSAGA_Alc_12_Frq_5plus,SSAGA_Alc_12_Frq_Drk,SSAGA_Alc_12_Max_Drinks,SSAGA_Alc_Age_1st_Use,SSAGA_Alc_Hvy_Drinks_Per_Day,SSAGA_Alc_Hvy_Frq,SSAGA_Alc_Hvy_Frq_5plus,SSAGA_Alc_Hvy_Frq_Drk,SSAGA_Alc_Hvy_Max_Drinks,Total_Any_Tobacco_7days,Times_Used_Any_Tobacco_Today,Num_Days_Used_Any_Tobacco_7days,Avg_Weekday_Any_Tobacco_7days,Avg_Weekend_Any_Tobacco_7days,Total_Cigarettes_7days,Avg_Weekday_Cigarettes_7days,Avg_Weekend_Cigarettes_7days,Total_Cigars_7days,Avg_Weekday_Cigars_7days,Avg_Weekend_Cigars_7days,Total_Pipes_7days,Avg_Weekday_Pipes_7days,Avg_Weekend_Pipes_7days,Total_Chew_7days,Avg_Weekday_Chew_7days,Avg_Weekend_Chew_7days,Total_Snuff_7days,Avg_Weekday_Snuff_7days,Avg_Weekend_Snuff_7days,Total_Other_Tobacco_7days,Avg_Weekday_Other_Tobacco_7days,Avg_Weekend_Other_Tobacco_7days,SSAGA_FTND_Score,SSAGA_HSI_Score,SSAGA_TB_Age_1st_Cig,SSAGA_TB_DSM_Difficulty_Quitting,SSAGA_TB_DSM_Tolerance,SSAGA_TB_DSM_Withdrawal,SSAGA_TB_Hvy_CPD,SSAGA_TB_Max_Cigs,SSAGA_TB_Reg_CPD,SSAGA_TB_Smoking_History,SSAGA_TB_Still_Smoking,SSAGA_TB_Yrs_Since_Quit,SSAGA_TB_Yrs_Smoked,SSAGA_Times_Used_Illicits,SSAGA_Times_Used_Cocaine,SSAGA_Times_Used_Hallucinogens,SSAGA_Times_Used_Opiates,SSAGA_Times_Used_Sedatives,SSAGA_Times_Used_Stimulants,SSAGA_Mj_Use,SSAGA_Mj_Ab_Dep,SSAGA_Mj_Age_1st_Use,SSAGA_Mj_Times_Used
101208,35,true,NotMZ,DZ,51330_81195,51330,81195,,Black or African Am.,Hispanic/Latino,100,2,8,17,0,1,1,63,133,23.56,1,1,1,37,39,115,76,0.85,5.5,0,,0,,0,,1,,15,2,27,,,,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,50,2,51,4,56,2,51,2,50,1,50,1,51,0,50,6,3,8,46,2,38,10,20,41,6,56,4,51,1,51,1,50,1,50,1,0,1,50,0,0,1,1,0,NORMAL,B,20,16,-2.5,false,false,false,false,false,false,false,false,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,1,0,1,,,,,,,,,,,,0,0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,0,0.0,0.0,,,,,,,,,,0,0,,,0,0,0,0,0,0,0,0,,0

So from what I know until now, I have to attach a phenotype to the .fam file. So I try the following. Using age as an example phenotype

./plink --bfile genotypefile --pheno phenotype.csv --pheno-name Age_in_Yrs --make-bed --out filename

and this happens:

aldo@dell1:~/Desktop/PLINK$ ./plink --bfile MEGA_Chip --pheno rest.csv --pheno-name Age_In_Years --make-bed --out mergedage
PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to mergedage.log.
Options in effect:
  --bfile MEGA_Chip
  --make-bed
  --out mergedage
  --pheno rest.csv
  --pheno-name Age_In_Years

32117 MB RAM detected; reserving 16058 MB for main workspace.
2119803 variants loaded from .bim file.
1141 people (523 males, 618 females) loaded from .fam.
Error: Line 1 of --pheno file has fewer tokens than expected.

So i'm stuck at this error (Line 1 of --pheno file has fewer tokens than expected.). Modifying the phenotype.csv is a non issue, the file is small. However, I can't open the .ped file because it too big (9.7GB) and my computer just dies trying to do so.

Somehow yesterday I managed to modify the phenotype.csv in a way that the error turned into Line 1 of --(.ped , i think) file has fewer tokens than expected. I seem to have deleted columns or shifted them so that they matched (FID IID).

Any help would be appreciated

Thanks! :)

GWAS PLINK • 5.6k views
ADD COMMENT
1
Entering edit mode

I think you need space or tab separated file as pheno file, not comma, see pheno manual:

--pheno causes phenotype values to be read from the 3rd column of the specified space- or tab-delimited file,

ADD REPLY
0
Entering edit mode

Thanks for the reply!

Ok, I did what you suggested. corrected the pheno file to be tab delimited. After that, the error changed to Line 1 of --fam file has fewer tokens than expected.

So I decided to also change the .fam file to be tab delimited

then this happened.

Error: --pheno-name requires the --pheno file to have a header line with first two columns 'FID' and 'IID'

so I edited the fam and pheno files in a way that they both had matching FID/IID as the first 2 columns

.fam

52259_82122 100004  52259   82122   1   -9
56037_85858 100206  56037   85858   1   -9
51488_81352 100307  51488   81352   2   -9
51730_81594 100408  51730   81594   1   -9
52813_82634 100610  52813   82634   1   -9
51283_52850_81149   101006  51283   81149   2   -9

pheno:

FID IID FS_IntraCranial_Vol FS_BrainSeg_Vol FS_BrainSeg_Vol_No_VentFS_BrainSeg_Vol_No_Vent_Surf FS_LCort_GM_Vol FS_RCort_GM_Vol FS_TotCort_GM_Vol   FS_SubCort_GM_Vol   FS_Total_GM_Vol FS_SupraTentorial_Vol   FS_L_WM_Vol FS_R_WM_Vol FS_Tot_WM_Vol   FS_Mask_Vol a   
0   100004

and now I get this:

Options in effect:
  --bed MEGA_Chip.bed
  --bim MEGA_Chip.bim
  --fam MEGA_Chip.csv
  --make-bed
  --out mergedtry
  --pheno unrestricted.csv
  --pheno-name FS_BrainSeg_Vol_No_Vent

32117 MB RAM detected; reserving 16058 MB for main workspace.
2119803 variants loaded from .bim file.
1141 people (523 males, 618 females) loaded from .fam.
0 phenotype values present after --pheno.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 0 founders and 1141 nonfounders present.
Calculating allele frequencies... done.
Warning: 295883 het. haploid genotypes present (see mergedtry.hh ); many
commands treat these as missing.
Warning: Nonmissing nonmale Y chromosome genotype(s) present; many commands
treat these as missing.
Total genotyping rate is 0.995282.
2119803 variants and 1141 people pass filters and QC.
Note: No phenotypes present.
--make-bed to mergedtry.bed + mergedtry.bim + mergedtry.fam ... done.

The fam file is now .csv because I changed it to be tab delimited, but this should be an issue because I specified it in --fam

So the issue now is that it's not recognizing any phenotype.

This is how the end result .fam file looks like. With all the -9s of the missing phenotypes

aldo@dell1:~/Desktop/PLINK$ head mergedtry.fam
52259_82122 100004 52259 82122 1 -9
56037_85858 100206 56037 85858 1 -9
51488_81352 100307 51488 81352 2 -9
51730_81594 100408 51730 81594 1 -9
52813_82634 100610 52813 82634 1 -9

Thanks again for the help!!!

ADD REPLY

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6