Snptest (Gen/Sample) Files To R
4
2
Entering edit mode
12.6 years ago
Joey ▴ 430

Does anyone know or can point me to any resource regarding how to convert SNPTEST dosage data files (GEN/SAMPLE) files so that they work in R/SAS? PLINK can read SNPTEST dosage format data but it seems that it can only perform association tests whereas I would like to perform a multinomial logistic regression.

Thanks,

-Joey


EDIT: added the example data: I was given two sets of data: a) Hapmap2 imputed -> a series of files for each chromosome in standard SNPTEST format (GEN/SAMPLE) and a *.mlinfo file i.e. 66 files in total.

The *.mlinfo files looks like the following:

SNP POS A1 A2 REF_FREQ RSQ
rs10047182 4434181 A G 0.117476853526221 0.98222786900009
rs1009345 3576288 A G 0.395093490054250 0.389054499338887

b) 1000 genomes imputed dataset: IMPUTE v2 was used to get the files. For each chromosome, I have around 40-50 chunks depending on the # os SNPs in each.

I have a chunk1_info file which has the following:

np_id rs_id position exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- rs58108140 10583 0.125 0.025 0.765 0 -1 -1 -1
--- rs3877545 11508 1.000 0.000 1.000 0 -1 -1 -1

A infobysample file:

concord_type0 r2_type0
0.949 0.915
0.949 0.936

and the SNP information contained in each of the chunks:

--- rs4912140 20001071 T G 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0.004 0.595 0.401 0 ........

I guess what I want is a file similar to the file one gets when one uses --recodeA option in PLINK. I can use the *.raw file along with covariates to run a bunch of other models (multinomial logit or cox prop, hazards model).

Thanks,

Joey

gwas r plink • 10k views
ADD COMMENT
1
Entering edit mode

It could be helpful if you posted a small example of what this file format looks like, and perhaps also what you want to convert it to ("so that they work in R" is not very specific - R is pretty flexible and does not require strictly specified file formats).

ADD REPLY
1
Entering edit mode

Just added a link to our GWASToolKit on GitHub: https://github.com/swvanderlaan/GWASToolKit.

ADD REPLY
1
Entering edit mode
12.1 years ago
zx8754 11k

GenABEL package has a function impute2databel()

For analysis try ProbABEL.

This post might be helpful.

ADD COMMENT
1
Entering edit mode
8.3 years ago

Any interest for this issue? We made a bash- and a perl-script to convert impute2 data to (plink) style dosage data. If needed I can post a link to the scripts.

By popular demand. Here is the link to our beta-version of 'GWASToolKit': https://github.com/swvanderlaan/GWASToolKit. You can use one of the two files named 'convert_impute2dosage.pl' or 'convert_impute2dosage.sh' to convert impute2 data to dosages.

ADD COMMENT
2
Entering edit mode

Please do, if someone's future google adventure leads here they might order a hit on you if this is the last post.

ADD REPLY
1
Entering edit mode

Just added a link to our GWASToolKit on GitHub: https://github.com/swvanderlaan/GWASToolKit.

ADD REPLY
1
Entering edit mode

I would be very interested in seeing your bash and perl scripts for this data conversion.

I had been looking into using Gtool for it but a simple bash script would be preferable in my eyes.

ADD REPLY
0
Entering edit mode

Just added a link to our GWASToolKit on GitHub: https://github.com/swvanderlaan/GWASToolKit.

ADD REPLY
0
Entering edit mode
12.4 years ago
Michael 54k

It seems like you can use read.table or scan as with any tabular text format. read ?read.table and ?scan. Use scan if read.table takes too long.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6