Recoding HLA Alleles by 01
1
0
Entering edit mode
5.9 years ago

I have a large data frame with HLA alleles like that:

> sample.id allele1 allele2
1   44:03   45:01
2   15:16   15:16
3   07:02   15:03
4   07:02   18:01
5   35:01   44:03

I want to convert HLA alleles in genotypes such as:

> sample.id 44:03   45:01   15:16   07:02   15:03   18:01   35:01
1   1   1   0   0   0   0   0
2   0   0   2   0   0   0   0
3   0   0   0   1   1   0   0
4   0   0   0   1   0   1   0
5   1   0   0   0   0   0   1

I have a large data frame with 5000 samples and 65 alleles. So at the end I should have a column for each sample and each allele. So dimensions should be 5000x65. Do you know any R package, tools etc for that conversion?

Thanks!

R hla • 922 views
ADD COMMENT
0
Entering edit mode
5.9 years ago

With that many variables, you'll struggle to do it quickly in R. You could try this awk solution outside of R. It should work with any number of alleles and samples. Tested on linux (Ubuntu) BASH, but should also work on MAC.

cat HLA.tsv
sample.id   allele1 allele2
1   44:03   45:01
2   15:16   15:16
3   07:02   15:03
4   07:02   18:01
5   35:01   44:03

.

awk -F"\t" 'NR==FNR {if (NR==1) printf $1"\t"; if (NR>1) {for (i=2; i<=NF; i++) allele[$(i)]}; next} {if (FNR==1) {for (a in allele) printf a"\t"}; if (FNR>1) {printf "\n"$1"\t"; present=0; for (a in allele) {present=0; for (i=2; i<=NF; i++) {if ($(i)==a) {present=1}}; if (present==1) {printf "1\t"} else {printf "0\t"}; if (i==NF) printf "\n"}}} END {printf "\n"}' HLA.tsv HLA.tsv | sed 's/\t$//g'
sample.id   45:01   44:03   18:01   07:02   15:03   35:01   15:16
1           1       1       0       0       0       0       0
2           0       0       0       0       0       0       1
3           0       0       0       1       1       0       0
4           0       0       1       1       0       0       0
5           0       1       0       0       0       1       0
ADD COMMENT
0
Entering edit mode

Hi Kevin,

what about the conversion of HLA allele to popgene format; could you please take a look at this post ? do you have any suggestion?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6