Convert CSV to PED
1
1
Entering edit mode
3.4 years ago

Hi, I have a CSV file containing IDs SNPs information. I want to convert a CSV file to PED format using PLINK Please help!!!

gwas plink • 4.3k views
ADD COMMENT
2
Entering edit mode
3.4 years ago
brunobsouzaa ▴ 830

Taken from here

If your .csv file contains data reqired for .ped and .map formats you can use it directly. For the .ped mandatory columns are: Family ID, Individual ID, Paternal ID, Maternal ID, Sex (1=male; 2=female; other=unknown), Phenotype. You need these data to run Plink. Then instead of a command:

plink --ped mydata.ped --map autosomal.map

try:

plink --ped mydata.csv --map autosomal.csv

ADD COMMENT
0
Entering edit mode

Yeah I had seen that post earlier and I posted my query after a lot of googling. So, the problem is when I tried that I got an error saying:

Error: Line 1 of .map file has fewer tokens than expected

ADD REPLY
0
Entering edit mode

To make you understand better, I am pasting a few columns of my ped and map file here below. Here is how my ped (.csv) file looks like. The respective columns are: IID,FID,PID, MID, Sex, Phenotype, SNP

 1  1   0   0   2   1   TC  TT  AA  AG  CA  CA  AG  GG  GG  GG  CC  GA  TC  TC  GG  
2   2   0   0   2   1   TC  TT  AA  AA  CA  CA  AA  CG  GA  GT  TT  GA  TC  CC  GG  
3   3   0   0   2   1   TC  CC  AA  AA  CC  AA  AG  CC  AA  GT  TT  AA  CC  CC  GG  
4   4   0   0   2   1   TC  TT  AA  AA  AA  CC  AG  CG  GA  GT  CT  GA  TC  CC  GG  
5   5   0   0   2   1   TC  TT  AA  AA  CA  CA  GG  CG  AA  TT  CT  AA  CC  CC  GA

And my Map(.csv) looks like this. The respective columns are Chromosome, SNPid, Genetic position, Physical position

17    rs1049620     0   49404152

 6  rs1143684     0 3010156

13  rs11571836   0  32399302

 8  rs14448       0 89933605

13  rs144848            0   32332592
ADD REPLY
0
Entering edit mode

Try spaces between the MAP columns. Also, be sure that there are no hidden carriage returns like ^M - try dos2unix

ADD REPLY
0
Entering edit mode

Hi Kevin, I did not understand what you mean by "no hidden carriage returns like ^M - try dos2unix" ?

ADD REPLY
0
Entering edit mode

FYI, all my files have been created on Linux .

ADD REPLY
0
Entering edit mode

If you open your file in vi, do you see any unusual characters at the line ends?

ADD REPLY
0
Entering edit mode

No Kevin. It does not have any unicode or unusual characters.

ADD REPLY
0
Entering edit mode

I also tried converting my CSV into TSV and got an error : Error: Invalid chromosome code '17press' on line 1 of .map file. (Use --allow-extra-chr to force it to be accepted.)

Then, I used --allow-extra-chr and I got another error : Error: Invalid bp coordinate on line 1 of .map file.

Then I manually checked the coordinates of the 1st variant (rs1049620) on google and found that it was actually wrong. For the knowledge, this SNP has no mention in the dbSNP which is the largest hub of genetic variants and hence was fetched wrongly from some other database I think. I wonder how such an error could incur since I fetched all those chromosomal locations using Ensembl Biomart. To further confirm, I checked other bp coordinates also but they were all correct.

I again ran the above command after correcting. But it shows the same error : Error: Invalid bp coordinate on line 1 of .map file.

I have spent all my day around this and I still couldn't find the problem. :( It would be great if someone could help me with it or suggest me some alternative way of converting CSV/TSV into MAP format!!!

ADD REPLY
1
Entering edit mode

Perhaps first try it with a minimal reproducible example of just a few variants

1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A

.

17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605
ADD REPLY
0
Entering edit mode

Thank you Kevin for the valuable response but I am still getting the same error.

ADD REPLY
1
Entering edit mode

You should not be getting the same error - take a look:

cat test.ped 
1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A

cat test.map 
17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605

plink --ped test.ped --map test.map 
PLINK v1.90b3.38 64-bit (7 Jun 2016)       https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --map test.map
  --ped test.ped

15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (4 variants, 5 people).
--file: plink.bed + plink.bim + plink.fam written.

Please check again the formatting of your data. Anything like even an extra space can cause an issue

ADD REPLY
1
Entering edit mode

The problem is fixed. Thank you Kevin! :)

ADD REPLY

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6