Question: Make A Special File By Plink
0
gravatar for mary
6.8 years ago by
mary210
Bologna university
mary210 wrote:

Dear All I working on SNP genotyped data of Bovine 50K beadchip, as all know the ped file format is as below:

FAM001 1 0 0 1 2 A A G G A C

i want to have this file :

FAM001 1 0 0 1 2 A G A

FAM001 1 0 0 1 2 A G C

I means i want to have SNP genotype in on column is there any command in plink for making this file. I will be appreciate if some one help me

plink ped file • 3.4k views
ADD COMMENTlink modified 6.8 years ago by zx87549.4k • written 6.8 years ago by mary210

It would help if you could tell us why do you need it in this format.

ADD REPLYlink written 6.8 years ago by zx87549.4k
1

Hi

I want to make a input file of sweep v1.1 software. Sweep accepts a standard format of genotype data, fully phased with missing data filled in. it should be two file 1. Genotype data file and 2. SNP data file. Genotype data file contain

Column 1: the individual identifier. - Column 2: the chromosome identifier. For autosomes you should have two chromosomes per individual. We can label the two chromosomes T for transmitted and U for untransmitted, (but it can be anything eg. A and B.) - Columns 3 – N: each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4. same as below

1331-1331FF12 T 1 3 3 2

1331-1331FF12 U 1 1 1 2

1331-1331FM13 T 1 3 3 2

1331-1331FM13 U 1 3 3 4

and The SNP data file has 3 tab-delimited columns, which gives information about the markers you genotyped. and file contain :

Column 1: The SNP identifier. This can be an rs number or any other name you choose to give. - Column 2: The chromosome. - Column 3: The SNP position based on the build identified. UMD3.1 are currently recognized as below:

snpid chr HG16

rs267265 3 45548733

rs267262 3 45567119

rs267241 3 45578901

thanks for your attention

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by mary210
1

Thanks for explaining what you need the data for, but I would caution you that your data are most likely not fully or even partially phased. Array data are not phased by haplotype, and if you require phasing you need to first apply a method that attempts to phase your genotypes by haplotype.

ADD REPLYlink written 6.8 years ago by Matt Shirley9.4k

Hi Matt thanks for your guide, actually I am new in haplotype phasing, I know I can use fastPHASE or PHASE for haplotype phasing (Stephens, Smith et al. 2001). I did it in Linux but i don't have enough memory for haplotype reconstruction of whole chromosome. and also when I reconstruct partial segment it didn't give me sweep input format, so I think, may be i can use plink. I will be appreciate if you help me for haplotype reconstruction.

ADD REPLYlink written 6.8 years ago by mary210

I don't think I can help you much with the actual work, but just wanted to make sure you weren't expecting that your genotypes were already phased.

ADD REPLYlink written 6.8 years ago by Matt Shirley9.4k

Have a look at SHAPEIT, it is "multi-threaded to tailor computational times to your resources."

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by zx87549.4k
0
gravatar for Matt Shirley
6.8 years ago by
Matt Shirley9.4k
Cambridge, MA
Matt Shirley9.4k wrote:

You might want to try plink --file data.ped --recodeAB --out dataAB, which will recode AGCT as A|B depending on the major/minor allele. Then you can do sed -e 's/A A/0/g' -e 's/A B/1/g' -e 's/B B/2/g' dataAB.ped > data012.ped. This last command just collapses A|A > 0, A|B > 1, B|B > 2.

ADD COMMENTlink written 6.8 years ago by Matt Shirley9.4k
1

Hi Matt thanks for your reply, actually I want have ped file that recod1234 and each sample repeated in two line and each column gives the allele for one SNP in the order of its position on the chromosome. The alleles are represented as A=1, C=2, G=3, T=4.

1331-1331FF12 1 0 0 2 1 3 3 2

1331-1331FF12 1 0 0 2 1 1 1 2

The first row therefore represents one chromosome for individual 1331-1331FF12 with the haplotype AGGC. The second row represents the other chromosome for individual 1331-1331FF12 with the haplotype AAAC.

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by mary210

I see now. I'm not sure there's a simple plink command for this. The PED file does not contain phased haplotypes at all, so you'll have to impute haplotypes somehow first: http://pngu.mgh.harvard.edu/~purcell/plink/haplo.shtml

ADD REPLYlink written 6.8 years ago by Matt Shirley9.4k
0
gravatar for zx8754
6.8 years ago by
zx87549.4k
London
zx87549.4k wrote:

If genotypes needed in 1234 format then use plink --recode1234.

Then, here is a quick R code:

#dummy data
x <- read.table(text="
FAM001 1 0 0 1 2 A A G G A C
FAM002 1 0 0 1 2 A T G C C C
FAM003 1 0 0 1 2 1 1 3 3 1 2
FAM004 1 0 0 1 2 1 4 3 2 2 2
                ")
#subset
x1 <- cbind(x[,c(1:6)],"T",x[,seq(7,ncol(x),2)])
x2 <- cbind(x[,c(1:6)],"U",x[,seq(8,ncol(x),2)])

#make same colnames for "rbind"
colnames(x2) <- colnames(x1)

#join
x3 <- rbind(x1,x2)

#sort by "FamID" and "TU"
result <- x3[with(x3, order(x3[,1],x3[,7])), ]

#output
result

V1 V2 V3 V4 V5 V6 "T" V7 V9 V11
1 FAM001  1  0  0  1  2   T  A  G   A
5 FAM001  1  0  0  1  2   U  A  G   C
2 FAM002  1  0  0  1  2   T  A  G   C
6 FAM002  1  0  0  1  2   U  T  C   C
3 FAM003  1  0  0  1  2   T  1  3   1
7 FAM003  1  0  0  1  2   U  1  3   2
4 FAM004  1  0  0  1  2   T  1  3   2
8 FAM004  1  0  0  1  2   U  4  2   2
ADD COMMENTlink written 6.8 years ago by zx87549.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 683 users visited in the last hour