Question: Convert CSV to PED
1
gravatar for Dhara Awasthi
11 weeks ago by
Indian Institute of Technology, Jodhpur
Dhara Awasthi20 wrote:

Hi, I have a CSV file containing IDs SNPs information. I want to convert a CSV file to PED format using PLINK Please help!!!

plink gwas • 245 views
ADD COMMENTlink written 11 weeks ago by Dhara Awasthi20
2
gravatar for brunobsouzaa
10 weeks ago by
brunobsouzaa490
Brazil
brunobsouzaa490 wrote:

Taken from here

If your .csv file contains data reqired for .ped and .map formats you can use it directly. For the .ped mandatory columns are: Family ID, Individual ID, Paternal ID, Maternal ID, Sex (1=male; 2=female; other=unknown), Phenotype. You need these data to run Plink. Then instead of a command:

plink --ped mydata.ped --map autosomal.map

try:

plink --ped mydata.csv --map autosomal.csv

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by brunobsouzaa490

Yeah I had seen that post earlier and I posted my query after a lot of googling. So, the problem is when I tried that I got an error saying:

Error: Line 1 of .map file has fewer tokens than expected

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20

To make you understand better, I am pasting a few columns of my ped and map file here below. Here is how my ped (.csv) file looks like. The respective columns are: IID,FID,PID, MID, Sex, Phenotype, SNP

 1  1   0   0   2   1   TC  TT  AA  AG  CA  CA  AG  GG  GG  GG  CC  GA  TC  TC  GG  
2   2   0   0   2   1   TC  TT  AA  AA  CA  CA  AA  CG  GA  GT  TT  GA  TC  CC  GG  
3   3   0   0   2   1   TC  CC  AA  AA  CC  AA  AG  CC  AA  GT  TT  AA  CC  CC  GG  
4   4   0   0   2   1   TC  TT  AA  AA  AA  CC  AG  CG  GA  GT  CT  GA  TC  CC  GG  
5   5   0   0   2   1   TC  TT  AA  AA  CA  CA  GG  CG  AA  TT  CT  AA  CC  CC  GA

And my Map(.csv) looks like this. The respective columns are Chromosome, SNPid, Genetic position, Physical position

17    rs1049620     0   49404152

 6  rs1143684     0 3010156

13  rs11571836   0  32399302

 8  rs14448       0 89933605

13  rs144848            0   32332592
ADD REPLYlink modified 10 weeks ago by Kevin Blighe69k • written 10 weeks ago by Dhara Awasthi20

Try spaces between the MAP columns. Also, be sure that there are no hidden carriage returns like ^M - try dos2unix

ADD REPLYlink written 10 weeks ago by Kevin Blighe69k

Hi Kevin, I did not understand what you mean by "no hidden carriage returns like ^M - try dos2unix" ?

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20

FYI, all my files have been created on Linux .

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20

If you open your file in vi, do you see any unusual characters at the line ends?

ADD REPLYlink written 10 weeks ago by Kevin Blighe69k

No Kevin. It does not have any unicode or unusual characters.

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20

I also tried converting my CSV into TSV and got an error : Error: Invalid chromosome code '17press' on line 1 of .map file. (Use --allow-extra-chr to force it to be accepted.)

Then, I used --allow-extra-chr and I got another error : Error: Invalid bp coordinate on line 1 of .map file.

Then I manually checked the coordinates of the 1st variant (rs1049620) on google and found that it was actually wrong. For the knowledge, this SNP has no mention in the dbSNP which is the largest hub of genetic variants and hence was fetched wrongly from some other database I think. I wonder how such an error could incur since I fetched all those chromosomal locations using Ensembl Biomart. To further confirm, I checked other bp coordinates also but they were all correct.

I again ran the above command after correcting. But it shows the same error : Error: Invalid bp coordinate on line 1 of .map file.

I have spent all my day around this and I still couldn't find the problem. :( It would be great if someone could help me with it or suggest me some alternative way of converting CSV/TSV into MAP format!!!

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20
1

Perhaps first try it with a minimal reproducible example of just a few variants

1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A

.

17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605
ADD REPLYlink written 10 weeks ago by Kevin Blighe69k

Thank you Kevin for the valuable response but I am still getting the same error.

ADD REPLYlink written 10 weeks ago by Dhara Awasthi20
1

You should not be getting the same error - take a look:

cat test.ped 
1 1 0 0 2 1 T C T T A A A G
2 2 0 0 2 1 T C T T A A A A
3 3 0 0 2 1 T C C C A A A A
4 4 0 0 2 1 T C T T A A A A
5 5 0 0 2 1 T C T T A A A A

cat test.map 
17 rs1049620 0 49404152
6 rs1143684 0 3010156
13 rs11571836 0 32399302
8 rs14448 0 89933605

plink --ped test.ped --map test.map 
PLINK v1.90b3.38 64-bit (7 Jun 2016)       https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --map test.map
  --ped test.ped

15037 MB RAM detected; reserving 7518 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (4 variants, 5 people).
--file: plink.bed + plink.bim + plink.fam written.

Please check again the formatting of your data. Anything like even an extra space can cause an issue

ADD REPLYlink written 10 weeks ago by Kevin Blighe69k
1

The problem is fixed. Thank you Kevin! :)

ADD REPLYlink written 9 weeks ago by Dhara Awasthi20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2448 users visited in the last hour
_