Question: Convert Genotype Matrix Into Plink Format
1
gravatar for shamrocktuesday
4.4 years ago by
shamrocktuesday10 wrote:

I have a genotype matrix (700 000 rows of SNPs and 2000 columns of samples). It's coded as 0/1/2 or NA. I want to convert this into plink format ped and map files. What's the best way to do this?

Thanks for the help!

It looks like:

         Sample1      Sample2        Sample3     Sample N
  SNP1     0            1              0            2  
  SNP2     0            NA             0            0  
  SNP3     0            0              0            0  
  SNP4     0            NA             0            0  
  SNP5     0            1              0            2  
  SNP6     0            NA             0            0  
  SNP7     2            1              0            2  
  SNP8     NA            NA             NA            NA
plink • 3.1k views
ADD COMMENTlink modified 3 months ago by Kian40 • written 4.4 years ago by shamrocktuesday10
4

Can you show us sample of the file?

ADD REPLYlink written 4.4 years ago by zx87544.5k

zx8754 is right an example is a must.

ADD REPLYlink written 4.4 years ago by Zev.Kronenberg11k

can you tell me, How can i do it in R? I have a genotype matrix (near 3000 animal with 50 000 SNP in columns). It's coded as 0/1/2 or NA. I want to convert this into plink format in form allelic format for example 0 to 0 0, 1 to 1 1 and 2 to 2 2. this is a format for PLINK for quaity control my data, What's the best way to do this in R?

ADD REPLYlink modified 3 months ago • written 3 months ago by Kian40
2
gravatar for zx8754
4.4 years ago by
zx87544.5k
London
zx87544.5k wrote:

Based on your example data named raw.txt, you can make TPED format files, then use plink to convert to pedmap format:

#make tped
awk 'NR != 1 {print 1,$1,0,NR}' raw.txt > temp_snp.txt
cut -f2- raw.txt | sed '1,1d' | sed 's/0/A A/g' | sed 's/1/A B/g' | sed 's/2/B B/g' | sed 's/NA/0 0/g' > temp_geno.txt
paste temp_snp.txt temp_geno.txt > plink.tped

#make tfam
head -n1 raw.txt | tr '\t' '\n' | sed '1,1d' | awk '{print $1,$1,0,0,1,1}' > plink.tfam

#convert to PedMap
plink --noweb \
--tfile plink \
--recode \
--out plink

Or I would just use R for analysis.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by zx87544.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour