Convert CSV to PED
1
1
Entering edit mode
22 months ago

Hi, I have a CSV file containing IDs SNPs information. I want to convert a CSV file to PED format using PLINK Please help!!!

2
Entering edit mode
22 months ago
brunobsouzaa ▴ 780

Taken from here

If your .csv file contains data reqired for .ped and .map formats you can use it directly. For the .ped mandatory columns are: Family ID, Individual ID, Paternal ID, Maternal ID, Sex (1=male; 2=female; other=unknown), Phenotype. You need these data to run Plink. Then instead of a command:

plink --ped mydata.ped --map autosomal.map


try:

0
Entering edit mode

Yeah I had seen that post earlier and I posted my query after a lot of googling. So, the problem is when I tried that I got an error saying:

Error: Line 1 of .map file has fewer tokens than expected

0
Entering edit mode

To make you understand better, I am pasting a few columns of my ped and map file here below. Here is how my ped (.csv) file looks like. The respective columns are: IID,FID,PID, MID, Sex, Phenotype, SNP

 1  1   0   0   2   1   TC  TT  AA  AG  CA  CA  AG  GG  GG  GG  CC  GA  TC  TC  GG
2   2   0   0   2   1   TC  TT  AA  AA  CA  CA  AA  CG  GA  GT  TT  GA  TC  CC  GG
3   3   0   0   2   1   TC  CC  AA  AA  CC  AA  AG  CC  AA  GT  TT  AA  CC  CC  GG
4   4   0   0   2   1   TC  TT  AA  AA  AA  CC  AG  CG  GA  GT  CT  GA  TC  CC  GG
5   5   0   0   2   1   TC  TT  AA  AA  CA  CA  GG  CG  AA  TT  CT  AA  CC  CC  GA


And my Map(.csv) looks like this. The respective columns are Chromosome, SNPid, Genetic position, Physical position

17    rs1049620     0   49404152

6  rs1143684     0 3010156

13  rs11571836   0  32399302

8  rs14448       0 89933605

13  rs144848            0   32332592

0
Entering edit mode

Try spaces between the MAP columns. Also, be sure that there are no hidden carriage returns like ^M - try dos2unix

0
Entering edit mode

Hi Kevin, I did not understand what you mean by "no hidden carriage returns like ^M - try dos2unix" ?

0
Entering edit mode

FYI, all my files have been created on Linux .

0
Entering edit mode

If you open your file in vi, do you see any unusual characters at the line ends?

0
Entering edit mode

No Kevin. It does not have any unicode or unusual characters.

0
Entering edit mode

I also tried converting my CSV into TSV and got an error : Error: Invalid chromosome code '17press' on line 1 of .map file. (Use --allow-extra-chr to force it to be accepted.)

Then, I used --allow-extra-chr and I got another error : Error: Invalid bp coordinate on line 1 of .map file.

Then I manually checked the coordinates of the 1st variant (rs1049620) on google and found that it was actually wrong. For the knowledge, this SNP has no mention in the dbSNP which is the largest hub of genetic variants and hence was fetched wrongly from some other database I think. I wonder how such an error could incur since I fetched all those chromosomal locations using Ensembl Biomart. To further confirm, I checked other bp coordinates also but they were all correct.

I again ran the above command after correcting. But it shows the same error : Error: Invalid bp coordinate on line 1 of .map file.

I have spent all my day around this and I still couldn't find the problem. :( It would be great if someone could help me with it or suggest me some alternative way of converting CSV/TSV into MAP format!!!

1
Entering edit mode

Perhaps first try it with a minimal reproducible example of just a few variants

1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A


.

17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605

0
Entering edit mode

Thank you Kevin for the valuable response but I am still getting the same error.

1
Entering edit mode

You should not be getting the same error - take a look:

cat test.ped
1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A

cat test.map
17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605

(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Options in effect:
--map test.map
--ped test.ped

15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (4 variants, 5 people).


Please check again the formatting of your data. Anything like even an extra space can cause an issue

1
Entering edit mode

The problem is fixed. Thank you Kevin! :)