imputed dosage values for vcf files
0
0
Entering edit mode
4 months ago
rheab1230 ▴ 30

Hello, I have one more question related to genotype file. I submitted the job in michigan imputation server.

and my imputed dosage for chr 22 file is:

22 16050435 22:16050435:T:C T C . PASS
AF=0.00098;MAF=0.00098;R2=0.23769;IMPUTED GT:DS:HDS:GP
0|0:0.002:0.001,0.000:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.000,0.001:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0.002:0.000,0.002:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.001:0.000,0.001:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0.002:0.000,0.002:0.998,0.002,0.000 0|0:0:0,0:1,0,0 0|0:0.001:0.001,0.000:0.999,0.001,0.000 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0 0|0:0:0,0:1,0,0

Its coming like this.

wherein the test data provided in your predixcan tutorial looks like this:

22_16494187_C_A_b37 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

Also it states that the imputed dosage should be encoded on a 0-2 scale representing the number imputed or number of the effect alleles the sample posseses. Do anyone know should I convert my vcf files to the desired input genotype file to train the model. I don't understand in my vcf files which column is representing dosage values Thank you.

files vcf dosage genotype predixcan imputed • 413 views
ADD COMMENT
0
Entering edit mode

This is what i have tried so far. I tried to remove the brackets using this command and it worked.

sed -e 's/[[^][]*]//g'

And for adding the 22_!605149_T_C I did this:

awk '{ID=$1""$2" "$3""$4" " +b37; print ID}' genotype.chr22_1.txt

genotype.chr22_2.txt

But it just gives me a separate file with these values and not want I want. I tried to join the columns using R but it doesn't generate a separate column. the command used is:

genotype_chr22_1$ID <- paste(genotype_chr22_1$CHROM, genotype_chr22_1$POS,genotype_chr22_1$REF, genotype_chr22_1$ALT, sep = "_")

ADD REPLY
0
Entering edit mode

cat <( \ paste -d '\t' \ <(echo "Id") \ <(head -1 test.tsv \ | cut -f5- \ | sed -e '$s/[[[:digit:]]+]//g; s/_HG[[:digit:]]+//g') ) \ <( \ paste -d '\t' \ <(awk 'NR>1 {print $1"_"$2"_"$3"_"$4"_b37"}' test.tsv) \ <(awk 'NR>1 {print $0}' test.tsv | cut -f5- | sed 's/.[[:digit:]]+//g')) \

output.tsv

I used this command to get the desired output.

ADD REPLY
0
Entering edit mode

Also to extract the dosage values from the above file after running michigan imputation server. I used bcftools +dosage function.

ADD REPLY

Login before adding your answer.

Traffic: 2165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6