Hi community,
I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf file. The tool I used is R package vcfR. I am not sure what I have done is correct or not. So, I want to ask how should we modify the vcf file. Example code:
vcf <- read.vcfR("abc.vcf.gz")
vcf@fix[,2] <- snp.pos.grch38 # vcf@fix contains the annotation of SNPs, the 2nd column is the position of SNPs, I changed hg19 to GRCh38
vcf@fix[,3] <- paste0(vcf@fix[,1],":",vcf@fix[,2]) # accordingly, changed the SNPs' IDs (chr:pos)
vcf@fix <- vcf@fix[-c(1,3,7,9),] # delete the 1st, 3rd, 7th, 9th SNPs
vcf@gt <- vcf@gt[-c(1,3,7,9),] # vcf@gt contains the SNP genotypes
write.vcf(vcf, file = "new_abc.vcf") # saved to a new vcf file
Use bgzip and tabix:
bgzip new_abc.vcf && tabix -p vcf new_abc.vcf.gz
By the way, after I saved to a new vcf file, I used bgzip to get the compressed vcf.gz file. But when I used tabix (tabix -p vcf xxxx.vcf.gz) to create index file, it reported errors: I feel very confused about the errors. Can you give some guidelines? Thank you in advance!
LiftoverVcf should be the tool of choice https://gatk.broadinstitute.org/hc/en-us/articles/360037060932-LiftoverVcf-Picard- .
Thank you for your reply! I tried it yesterday, most variants are rejected. I feel very confused....
check the reference, check the chain, check the name of the contigs (chr1 vs '1')