Transposed file to ped map file
0
0
Entering edit mode
8.4 years ago
dirranrak ▴ 20

Hi everyone,

Can anyone find what is wrong in these tped and tfam files? When I recoded them, I have got only one individual in map file that supposed to have many individuals.

tped file:

1    rs950122    0       836727    A B    0 0    A B    A B    A B    A B    A B   
1    rs10492936    0      2926730    B B    B B    B B    B B    B B    B B    0 0   
1    rs10489589    0      2941104    B B    B B    B B    B B    B B    B B    B B

tfam file:

Yoruba    YRI-NA18504    0    0    0    9
Yoruba    YRI-NA18505    0    0    0    9
Yoruba    YRI-NA18507    0    0    0    9

Thank you in advance

SNP software-error • 3.2k views
ADD COMMENT
0
Entering edit mode

Can you post the .log file for your run?

ADD REPLY
0
Entering edit mode
PLINK v1.90b3w 64-bit (3 Sep 2015)
Options in effect:
  --out Genotypes_All
  --recode
  --tfile Genotypes_All

Hostname: ANTHCARMC002.cla.psu.edu
Working directory: /Users/rur27/Documents/plink_mac
Start time: Fri Dec 11 13:34:51 2015

Random number seed: 1449858891
8192 MB RAM detected; reserving 4096 MB for main workspace.
Processing .tped file.
Warning: Extra columns in .tped file.  Ignoring.
Genotypes_All-temporary.bed + Genotypes_All-temporary.bim +
Genotypes_All-temporary.fam written.
54794 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to Genotypes_All.nosex .
1 phenotype value loaded from .fam.
Warning: Ignoring phenotypes of missing-sex samples.  If you don't want those
phenotypes to be ignored, use the --allow-no-sex flag.
Using 1 thread (no multithreaded calculations invoked.
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.970343.
54794 variants and 1 person pass filters and QC.
Phenotype data is quantitative.
--recode to Genotypes_All.ped + Genotypes_All.map ... done.

End time: Fri Dec 11 13:34:51 2015
ADD REPLY
0
Entering edit mode

My guess is that the .tfam file contains nonstandard linebreaks, and plink sees only a single line as a consequence. (If you type xxd Genotypes_All.tfam | head, what do you get?)

If this is the case, your problem should go away after you standardize the linebreaks in the .tfam file.

ADD REPLY
0
Entering edit mode

I did what you are asking with plink and I got this. I don't really know what does it mean

xxd Genotype_All.tfam|head
0000000: 4179 7461 0950 492d 4145 2d30 3030 3034  Ayta.PI-AE-00004
0000010: 322d 312d 3031 0909 3009 3009 3009 2d39  2-1-01..0.0.0.-9
0000020: 0d41 7974 6109 5049 2d41 452d 3030 3030  .Ayta.PI-AE-0000
0000030: 3433 2d31 2d30 3109 0930 0930 0930 092d  43-1-01..0.0.0.-
0000040: 390d 4179 7461 0950 492d 4145 2d30 3030  9.Ayta.PI-AE-000
0000050: 3034 362d 312d 3031 0909 3009 3009 3009  046-1-01..0.0.0.
0000060: 2d39 0d41 7974 6109 5049 2d41 452d 3030  -9.Ayta.PI-AE-00
0000070: 3030 3534 2d31 2d30 3109 0930 0930 0930  0054-1-01..0.0.0
0000080: 092d 390d 4179 7461 0950 492d 4145 2d30  .-9.Ayta.PI-AE-0
0000090: 3030 3036 372d 312d 3031 0909 3009 3009  00067-1-01..0.0.
ADD REPLY
0
Entering edit mode

That confirms that the linebreak bytes are the problem. They're supposed to be "0a" on OS X, not "0d".

You can fix this with:

mv Genotypes_All.tfam Genotypes_All_Old.tfam
cat Genotypes_All_Old.tfam | tr '\r' '\n' > Genotypes_All.tfam

Your conversion command should then work (and once you've verified that it works, you can delete Genotypes_All_Old.tfam).

ADD REPLY
0
Entering edit mode

Great, thank you very much, it worked. But I want to know 1.where is my mistake in these steps? 2. How does this script really work? the tr, \r and \n.

  1. take columns from text to form the tfam file and save it as .txt using excel.
  2. Open the file with textWrangler and save it as .tfam

the original data is from Pan-Asian SNP Consortium web site which is like this

affy-snp-id        dbsnp_126-rs-id    chromosome    position    alleles PI-AE-000042-1-01   PI-AE-000043-1-01   PI-AE-000046-1-01   PI-AE-000054-1-01
SNP_A-1677174      rs950122           1            836727       C/G     1                   9                   1                    1
SNP_A-1676440      rs10492936         1            2926730      A/G     2                   2                   2                    2
SNP_A-1662392      rs10489589         1            2941104      C/T     2                   2                   2                    2
ADD REPLY
0
Entering edit mode

In the TextWrangler document options, you need to specify "Unix" instead of "Mac" line endings. (OS X actually does not use "Mac" line endings; that option only applies to much older Mac operating systems.)

ADD REPLY
0
Entering edit mode

Thank you again chrchang523.

My project is to find the origin of the population of Madagascar. So I need to compare the data from this country with those from Jewish, Arabic, Austronesian and African sub-Saharan. I have HGDP data but the first is a text file from Stanford University web site which is very big (~2 Go) and not in plink version and the another one is from Rosenberg lab web site which is a structure file.

I was looking for the way to convert these kind of file to plink file but the only one I found did not work. Help.

ADD REPLY

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6