12 months ago by
Hello and welcome to the wonderful world of Plink!
I had a bit of a different issue than you, with 0 samples being read in rather than duplicates being found...
However here is the process I did which in the end worked:
I used the original FAM file to produce the columns Family ID and Sample ID so they were exactly the same in the Plink file columns and in the text file used to parse the PED file.
(In mine, they were called 'line_21', 'line_40' etc and my Family ID and Sample ID were exactly the same. So my text file looked like this (first 2 lines listed):
... etc listing more lines. It is supposed to work whether you use a space or a tab between your columns - mine only worked using a tab.
There were no column names, just 2 columns of Family ID, Sample ID and listing 55 samples. Should the first line of your IDlist.txt file instead say:
? Or were you just putting in 1 and 2 for my benefit to know they were separate IDs? Or are they part of the ID (should be joined by underscore instead of a space)?
You might have some unknown/hiding characters or spaces in that .txt file - if you are using a Mac I highly recommend Textwrangler as a really simple text file program that allows you to see any unknown/hiding characters in your file.
If you are not using a Mac - I know that on a PC I weirdly had to save my text file firstly as a .txt in Word, and then reopen it in Notepad, resave in that program as a .txt - and then it worked.
I had so many teething problems with getting my GWAS to run. By the time I got it to work, I had run the analysis MANY times with dummy files (much smaller subset of data so it ran quick) I had ONE dummy .txt file that worked successfully and used that specific file in the end by duplicating it and the manually typing in my samples. Plink seems to be ultra-niggly in the use of the txt files to parse down large genetic files.
I hope this is some help!
modified 12 months ago
12 months ago by
cinnie83 • 0