imputed dosage values for vcf files
0
0
Entering edit mode
2.3 years ago
rheab1230 ▴ 140

Hello,

I have one more question related to genotype file. I submitted the job in michigan imputation server and my imputed dosage for chr 22 file is:

22      16050435        22:16050435:T:C T       C       .       PASS  AF=0.00098;MAF=0.00098;R2=0.23769;IMPUTED       GT:DS:HDS:GP    0|0:0.002:0.001,0.000:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.000,0.001:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0.002:0.000,0.002:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.000,0.001:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.002:0.000,0.002:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0

It's coming like this wherein the test data provided in your predixcan tutorial looks like this:

22_16494187_C_A_b37     0       0       0       0       0       0       0       1       0       0       0       0       0       0       0       0       2
   0       0       0       0       0       0       0       0       0       0       0       0       1       0       0       0       0       0       0
   0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
   0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
   0       0       0       1       0       0       0       0       1       0       0       0       0       0       0       0       0       0       0
   0       1       0       0       0       0       0       0       0   

Also it states that the imputed dosage should be encoded on a 0-2 scale representing the number imputed or number of the effect alleles the sample possesses. Do anyone know should I convert my vcf files to the desired input genotype file to train the model. I don't understand in my vcf files which column is representing dosage values

Thank you

vcf genotype predixcan dosage • 1.6k views
ADD COMMENT
0
Entering edit mode

This is what I have tried so far.

I tried to remove the brackets using this command and it worked.

sed -e 's/\[[^][]*\]//g'

And for adding the 22_!605149_T_C I did this:

awk '{ID=$1"*"$2"* "$3"*"$4"* " +b37; print ID}' genotype.chr22_1.txt > genotype.chr22_2.txt

But it just gives me a separate file with these values and not want I want. I tried to join the columns using R but it doesn't generate a separate column. the command used is:

genotype_chr22_1$ID <- paste(genotype_chr22_1$CHROM, genotype_chr22_1$POS,genotype_chr22_1$REF, genotype_chr22_1$ALT, sep = "_")
ADD REPLY
0
Entering edit mode
cat <( \
      paste -d '\t' \
         <(echo "Id") \
         <(head -1 test.tsv \
      | cut -f5- \
      | sed -e '$s/\[[[:digit:]]\+\]//g; s/_HG[[:digit:]]\+//g') ) \
    <( \
      paste -d '\t' \
        <(awk 'NR>1 {print $1"_"$2"_"$3"_"$4"_b37"}' test.tsv) \
        <(awk 'NR>1 {print $0}' test.tsv | cut -f5- | sed 's/\.[[:digit:]]\+//g')) \
> output.tsv

I used this command to get the desired output.

ADD REPLY
0
Entering edit mode

Also to extract the dosage values from the above file after running michigan imputation server. I used bcftools +dosage function.

ADD REPLY

Login before adding your answer.

Traffic: 2549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6