GWAS data in .txt
1
0
Entering edit mode
3.6 years ago
Sam ▴ 10

I received some GWAS data in .txt format but it is in a format I am unfamiliar with.

SNP Sample_ID Chr Position X Y B_Allele_Freq Log_R_Ratio
200003 05T80 9 139026180 1.047 0.049 0.0009 0.1329
200006 05T80 9 139046223 1.058 0.918 0.5016 0.0155
200047 05T80 2 219793146 0.577 0.074 0.0132 0.2509
200050 05T80 2 219797929 0.009 1.414 1.0000 0.2460
200052 05T80 2 219783037 0.000 0.980 1.0000 0.0843

Can anyone tell me what is this format ? and how should i go about converting this to .map and .ped format?

Thanks

SNP • 756 views
ADD COMMENT
0
Entering edit mode
3.6 years ago

This format is, to me, 'no format', and there is no way to convert it to MAP or PED format in its current form. Why not ask [to the person who gave this to you] for the origin and an explanation of the data? The days when a bioinformatician is handed some data and told to make magic from it should be long in the past.

Some things to help (for asking):

  • program / script used to produce this data
  • information on the sample cohort
  • genome build used
  • desired analysis to perform

Kevin

ADD COMMENT
0
Entering edit mode

That's what i'm afraid of. i was expecting to receive the .iDAT files but i got this instead. Apparently the iDAT files are long gone.

ADD REPLY
0
Entering edit mode

I see, they just eliminated the files? To even start to do anything here, you would need to know:

  • the reference base at each position
  • the Illumina array type and version used
  • what are X and Y?

Even with all of this, some reverse engineering would be needed. The data that you have presented is basically the signal data prepared for, for example, copy number profiling. So, it does not say anything directly about the underlying genotypes.

ADD REPLY
0
Entering edit mode

This data was generate a long time ago, the iDAT was probably deleted to make space for something else, i cannot ask anyone because the people who handled this data have long since left the lab.

I know which chip was used and the version number. If i were to hazard a guess, X and Y should be referring to intensity for allele X and allele Y. I was hoping i can use this data to call genotype for each sample. Seems to be a job for GenomeStudio, but it does not take txt files.

ADD REPLY

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6