Changing the order of ref/alt alleles in a phased vcf file
1
1
Entering edit mode
7.5 years ago

Hi,

I have a vcf file with phased data ("dataset1"), which I want to analyse together with some other genotype data ("dataset2").

For some loci in dataset1, the ref/alt alleles are opposite to those in dataset2, i.e. for a given SNP I get A/G and G/A respectively.

I have to questions:

Is there any way I can either

(i) check quickly the ref/alt consistency across all my loci in the two datasets and ideally remove all inconsistent positions? would the --diff-site-discordance flag from vcftools perform something like that?

or

(ii) swop the ref/alt information for the SNPs of my choice directly on the vcf files? I want to avoid converting to plink because I don't want to lose the phase information

Any ideas will be very much appreciated.

 

strand flip DNA ref/alt plink vcftools • 3.7k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

Hi,

(i) Hi, yes option --diff-site-discordance calculates discordance on a site by site basis. 

(ii) Test this piece of R code. For each common loci (same chromosome and same position) shared between dataset 1 and 2, it tests consistency between reference and alternative alleles. If not, it swop ref and alt in dataset 2 for that locus.

 

# import datasets
vcf1 <- "path/to/dataset1.vcf" # this will be used as reference
vcf2 <- "path/to/dataset2.vcf" # this will be swopped 
# check ref/alt for common loci
dataset1 <- read.table(vcf1,stringsAsFactors = F)
dataset2 <- read.table(vcf2,stringsAsFactors = F)
common_loci = merge(dataset1,dataset2,by = c("V1","V2"),all=F,suffixes = c("_dataset1","_dataset2"))
dataset2_swop <- dataset2
for(id in 1:nrow(common_loci)){
  chr = common_loci$V1[id]
  pos = common_loci$V2[id]
  ref = common_loci$V4_dataset1[id]
  alt = common_loci$V5_dataset1[id]
  locus_index = which(dataset2$V1 == chr & dataset2$V2 == pos)
  locus = dataset2[locus_index,]
  if(locus$V4 == alt & locus$V5 == ref){
    # print loci with ref/alt swopped between datasets
    cat("inconsistent locus:",paste(locus$V1,locus$V2))
    dataset2_swop[locus_index,4] <- ref 
    dataset2_swop[locus_index,5] <- alt
  }
}
# print tab delimited output
write.table(dataset2_swop,file = "dataset2.swop.vcf",quote=F,col.names = F,row.names=F,sep="\t")

 

ADD COMMENT

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6