Question

Make A Special File By Plink

0

Entering edit mode

10.5 years ago

mary ▴ 210

Dear All

I working on SNP genotyped data of Bovine 50K beadchip, as all know the ped file format is as below:

FAM001 1 0 0 1 2 A A G G A C

I want to have this file:

FAM001 1 0 0 1 2 A G A
FAM001 1 0 0 1 2 A G C

I mean I want to have SNP genotype in one column. Is there any command in plink for making this file? I will appreciate if some one help me

plink ped • 4.5k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.5 years ago by mary ▴ 210

0

Entering edit mode

It would help if you could tell us why do you need it in this format.

ADD REPLY • link 10.5 years ago by zx8754 11k

1

Entering edit mode

Hi

I want to make a input file of sweep v1.1 software. Sweep accepts a standard format of genotype data, fully phased with missing data filled in. it should be two file 1. Genotype data file and 2. SNP data file. Genotype data file contain

Column 1: the individual identifier. - Column 2: the chromosome identifier. For autosomes you should have two chromosomes per individual. We can label the two chromosomes T for transmitted and U for untransmitted, (but it can be anything eg. A and B.) - Columns 3 – N: each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4. same as below

1331-1331FF12 T 1 3 3 2

1331-1331FF12 U 1 1 1 2

1331-1331FM13 T 1 3 3 2

1331-1331FM13 U 1 3 3 4

and The SNP data file has 3 tab-delimited columns, which gives information about the markers you genotyped. and file contain :

Column 1: The SNP identifier. This can be an rs number or any other name you choose to give. - Column 2: The chromosome. - Column 3: The SNP position based on the build identified. UMD3.1 are currently recognized as below:

snpid chr HG16

rs267265 3 45548733

rs267262 3 45567119

rs267241 3 45578901

thanks for your attention

ADD REPLY • link 10.5 years ago by mary ▴ 210

1

Entering edit mode

Thanks for explaining what you need the data for, but I would caution you that your data are most likely not fully or even partially phased. Array data are not phased by haplotype, and if you require phasing you need to first apply a method that attempts to phase your genotypes by haplotype.

ADD REPLY • link 10.5 years ago by Matt Shirley 10k

0

Entering edit mode

Hi Matt thanks for your guide, actually I am new in haplotype phasing, I know I can use fastPHASE or PHASE for haplotype phasing (Stephens, Smith et al. 2001). I did it in Linux but i don't have enough memory for haplotype reconstruction of whole chromosome. and also when I reconstruct partial segment it didn't give me sweep input format, so I think, may be i can use plink. I will be appreciate if you help me for haplotype reconstruction.

ADD REPLY • link 10.5 years ago by mary ▴ 210

0

Entering edit mode

I don't think I can help you much with the actual work, but just wanted to make sure you weren't expecting that your genotypes were already phased.

ADD REPLY • link 10.5 years ago by Matt Shirley 10k

0

Entering edit mode

Have a look at SHAPEIT, it is "multi-threaded to tailor computational times to your resources."

ADD REPLY • link 10.5 years ago by zx8754 11k

score 0 · Answer 1 · 2013-10-20

0

Entering edit mode

10.5 years ago

Matt Shirley 10k

You might want to try plink --file data.ped --recodeAB --out dataAB, which will recode AGCT as A|B depending on the major/minor allele. Then you can do sed -e 's/A A/0/g' -e 's/A B/1/g' -e 's/B B/2/g' dataAB.ped > data012.ped. This last command just collapses A|A > 0, A|B > 1, B|B > 2.

ADD COMMENT • link 10.5 years ago by Matt Shirley 10k

1

Entering edit mode

Hi Matt thanks for your reply, actually I want have ped file that recod1234 and each sample repeated in two line and each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4.

1331-1331FF12 1 0 0 2 1 3 3 2

1331-1331FF12 1 0 0 2 1 1 1 2

The first row therefore represents one chromosome for individual 1331-1331FF12 with the haplotype AGGC. The second row represents the other chromosome for individual 1331-1331FF12 with the haplotype AAAC.

ADD REPLY • link 10.5 years ago by mary ▴ 210

0

Entering edit mode

I see now. I'm not sure there's a simple plink command for this. The PED file does not contain phased haplotypes at all, so you'll have to impute haplotypes somehow first: http://pngu.mgh.harvard.edu/~purcell/plink/haplo.shtml

ADD REPLY • link 10.5 years ago by Matt Shirley 10k

score 0 · Answer 2 · 2013-10-21

If genotypes needed in 1234 format then use plink --recode1234.

Then, here is a quick R code:

#dummy data
x <- read.table(text="
FAM001 1 0 0 1 2 A A G G A C
FAM002 1 0 0 1 2 A T G C C C
FAM003 1 0 0 1 2 1 1 3 3 1 2
FAM004 1 0 0 1 2 1 4 3 2 2 2
                ")
#subset
x1 <- cbind(x[,c(1:6)],"T",x[,seq(7,ncol(x),2)])
x2 <- cbind(x[,c(1:6)],"U",x[,seq(8,ncol(x),2)])

#make same colnames for "rbind"
colnames(x2) <- colnames(x1)

#join
x3 <- rbind(x1,x2)

#sort by "FamID" and "TU"
result <- x3[with(x3, order(x3[,1],x3[,7])), ]

#output
result

V1 V2 V3 V4 V5 V6 "T" V7 V9 V11
1 FAM001  1  0  0  1  2   T  A  G   A
5 FAM001  1  0  0  1  2   U  A  G   C
2 FAM002  1  0  0  1  2   T  A  G   C
6 FAM002  1  0  0  1  2   U  T  C   C
3 FAM003  1  0  0  1  2   T  1  3   1
7 FAM003  1  0  0  1  2   U  1  3   2
4 FAM004  1  0  0  1  2   T  1  3   2
8 FAM004  1  0  0  1  2   U  4  2   2