Question: Converting between Impute2 and Ped/Map after imputation - 1st column dashes (--) problem
4
gravatar for Philip Robinson
6.7 years ago by
Philip Robinson40 wrote:

Dear All,

have taken a PED/MAP format PLINK file and converted it into a .gen/.sample file with gtool. This has given me this look:

pkd@bioinform:~/strand_correct_script/Files_during_updating$ head controls.gen | cut -d " " -f 1-20

5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
5 rs1421911 96000947 C T 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0
5 rs6860934 96001842 C T 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1

I understand from the IMPUTE website the first column should be SNP1,SNP2,SNP3, but I pushed on thinking maybe things would sort themselves out. I imputed with IMPUTE2 against 1000 genomes and then this produced this format of .gen file:

pkd@bioinform:~/Impute2/converting_back_to_plink$ head European_imputed_controls.gen | cut -d " " -f 1-20

--- 5-96000097 96000097 A G 1 0 0 1 0 0 0.976 0.024 0 1 0 0 1 0 0
--- 5-96000203 96000203 C T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
--- 5-96000264 96000264 C T 1 0 0 1 0 0 0.998 0.002 0 1 0 0 1 0 0
--- rs7733671 96000269 G A 0 1 0 0 1 0 0 0.947 0.052 1 0 0 0 0 1
--- 5-96000338 96000338 C A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
--- rs73774358 96000463 A G 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0
--- 5-96000525 96000525 G A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
--- rs73774359 96000658 A C 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0

When I tried to convert it back to PLINK PED/MAP Plink said that all the SNPs were named the same "---" and crashed in flames. I have read the gtool site and cannot see any reference to what it puts in the first column when it converts from PLINK to GEN/SAMPLE, or what IMPUTE2 should do when it imputed new snps. I can load the file into R and put an arbitrary first column in, but I was wondering whether this is necessary or have I made an error somewhere.

Thank you in advance.

Philip

plink imputation • 12k views
ADD COMMENTlink modified 4 months ago by zx87545.6k • written 6.7 years ago by Philip Robinson40
5
gravatar for chrchang523
4.6 years ago by
chrchang5234.2k
United States
chrchang5234.2k wrote:

PLINK 1.9 has "--recode oxford" for direct export, and --data/--gen/--bgen/--sample for import.  --hard-call-threshold can be used to set a genotype likelihood cutoff, or randomize genotypes based on the likelihoods, during import.

ADD COMMENTlink written 4.6 years ago by chrchang5234.2k
4
gravatar for Caddymob
6.7 years ago by
Caddymob930
United States
Caddymob930 wrote:

I have had this problem... I wrote a quick and dirty perl script to get past this. Nothing fancy, but it works. This is meant to do this by chromsome - I split these up on a cluster computer, but hopefully this gets you going.

#!/usr/bin/perl -w

$file = $ARGV[0];
$chr = $ARGV[1];

open(FILE,"<$file") || die;

while(<FILE>) {
    chomp($_);
    ($CHR,$SNP,$ZERO,$POS) = split;
    $ZERO = "0";
    if ($SNP =~ "---") {
        $SNP = "$chr:$POS";
        if (exists $snp_hash{$SNP}) {
            $snp_hash{$SNP}++;
            $SNP = $SNP . '.' . $snp_hash{$SNP};
        }
    } else {
         if (exists $snp_hash{$SNP}) {
             $snp_hash{$SNP}++;
             $newSNP = $SNP . '.' . $snp_hash{$SNP};
             $SNP = $newSNP;
         }
    }
    $snp_hash{$SNP}++;
    print "$chr\t$SNP\t$ZERO\t$POS\n";
}
ADD COMMENTlink written 6.7 years ago by Caddymob930
4
gravatar for Maxime Lamontagne
6.7 years ago by
Québec
Maxime Lamontagne2.1k wrote:

Do you try gtool? It can convert Impute2 output too PED/MAP format.

gtool -G --g file1 --s file2.sample --ped file3.ped --map file4.map --phenotype phenotype_1 --threshold 0.95 > output.gtool

###---EDIT---###

chr=1

awk -v var1=$chr '{
ORS = ""
print var1"\t"
if ($2 == "---") print "SNP."var1"."$4"\t"
else print $2"\t"
print $3"\t"
print $4"\n"
}' Chr${chr}.IMPUTE2.map > Chr${chr}.IMPUTE2.V2.map

I think this script is less complex. You can use it in a for loop with each chromosome in a separated file.

Thanks for the comment.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Maxime Lamontagne2.1k
1

he did use gtool, problem is that SNPs get called "---" in the map if there is no rsID#. My script above will convert these SNPs to chr:pos format to give them a uniq ID and get you through PLINK.

ADD REPLYlink written 6.7 years ago by Caddymob930
0
gravatar for Kantale
4.6 years ago by
Kantale70
Groningen, Netherlands
Kantale70 wrote:

Also take a look at this python implementation: 

http://www.pypedia.com/index.php/Convert_impute2_gprobs_to_PEDMAP_beagle_user_Kantale

From the long parameter list, you can only define the following parameters:

chromosome, input_impute2_gprobs_filename, input_impute2_info_filename, output_TPED_filename, output_TFAM_filename

Note: The output format is transposed PED/MAP files. You can use these files directly in plink with the --tfile parameter: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr

 

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Kantale70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour