Changing the order of ref/alt alleles in a phased vcf file
7.5 years ago


I have a vcf file with phased data ("dataset1"), which I want to analyse together with some other genotype data ("dataset2").

For some loci in dataset1, the ref/alt alleles are opposite to those in dataset2, i.e. for a given SNP I get A/G and G/A respectively.

I have to questions:

Is there any way I can either

(i) check quickly the ref/alt consistency across all my loci in the two datasets and ideally remove all inconsistent positions? would the --diff-site-discordance flag from vcftools perform something like that?


(ii) swop the ref/alt information for the SNPs of my choice directly on the vcf files? I want to avoid converting to plink because I don't want to lose the phase information

Any ideas will be very much appreciated.


strand flip DNA ref/alt plink vcftools • 3.7k views
7.5 years ago


(i) Hi, yes option --diff-site-discordance calculates discordance on a site by site basis. 

(ii) Test this piece of R code. For each common loci (same chromosome and same position) shared between dataset 1 and 2, it tests consistency between reference and alternative alleles. If not, it swop ref and alt in dataset 2 for that locus.


# import datasets
vcf1 <- "path/to/dataset1.vcf" # this will be used as reference
vcf2 <- "path/to/dataset2.vcf" # this will be swopped 
# check ref/alt for common loci
dataset1 <- read.table(vcf1,stringsAsFactors = F)
dataset2 <- read.table(vcf2,stringsAsFactors = F)
common_loci = merge(dataset1,dataset2,by = c("V1","V2"),all=F,suffixes = c("_dataset1","_dataset2"))
dataset2_swop <- dataset2
for(id in 1:nrow(common_loci)){
  chr = common_loci$V1[id]
  pos = common_loci$V2[id]
  ref = common_loci$V4_dataset1[id]
  alt = common_loci$V5_dataset1[id]
  locus_index = which(dataset2$V1 == chr & dataset2$V2 == pos)
  locus = dataset2[locus_index,]
  if(locus$V4 == alt & locus$V5 == ref){
    # print loci with ref/alt swopped between datasets
    cat("inconsistent locus:",paste(locus$V1,locus$V2))
    dataset2_swop[locus_index,4] <- ref 
    dataset2_swop[locus_index,5] <- alt
# print tab delimited output
write.table(dataset2_swop,file = "dataset2.swop.vcf",quote=F,col.names = F,row.names=F,sep="\t")



