non-SNP variants in a vcf merge file
1
0
Entering edit mode
4.7 years ago
evelyn ▴ 230

Hello,

I have used bwa for alignment and platypus for variant calling and filtered only SNPs using bcftools. I then merged the vcf files with only SNPs for some samples. But the merged file shows some odd positions with variants other than SNPs. Although individual vcf files does not show these calls. I used this code:

bwa mem ref.fa S_1.fastqS_2.fastq | samtools sort -o sorted.bam
samtools index sorted.bam
python Platypus.py callVariants --bamFiles=sorted.bam --refFile=ref.fa  --output=1.vcf
bcftools view --types snps 1.vcf -o 2.vcf
bgzip -c 2.vcf > 2.vcf.gz
tabix -p vcf 2.vcf.gz
bcftools merge --file-list list.txt -O v -o merge.vcf

I have pasted one weird call from the merged file:

Chr POS ID REF ALT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    TGCTCC  NA  TGCTCC
snp • 1.4k views
ADD COMMENT
0
Entering edit mode

Hello evelyn ,

what's the content of list.txt? Also your output example doesn't look like a valid vcf file. I'm missing the QUAL, FILTER, INFO and FORMAT columns.

fin swimmer

ADD REPLY
0
Entering edit mode

Hello @finswimmer,

list.txt contains list of vcf.gz samples to be merged. In order to show the problem only, I mistakenly broke the format. And I have more samples, I am just showing three here including the problem.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    126 badReads;MQ;HapScore    BRF=0.75;FR=0.502;HP=1;HapScore=2;MGOF=157;MMLQ=31;MQ=31.8;NF=3;NR=0;PP=55;QD=29.9237;SC=GATTGACAACCGCTCCAGTTT;SbPval=1;Source=Platypus;TC=3;TCF=3;TCR=0;TR=3;WE=1478;WS=1460   GT:GL:GOF:GQ:NR:NV TGCTCC  NA  TGCTCC
ADD REPLY
0
Entering edit mode
4.7 years ago

Some variant callers just call variants in this way. You can probably mitigate these by 'normalising' your data to a reference:

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
bcftools norm -m-any merge.vcf | \
  bcftools norm -Ob --check-ref w -f human_g1k_v37.fasta > merge_norm.vcf ;

Here, human_g1k_v37.fasta, would be whichever reference genome FASTA against which your variants were originally called.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you, @Kevin! I will try this.

ADD REPLY
0
Entering edit mode

Please let me know if it works....

ADD REPLY

Login before adding your answer.

Traffic: 2257 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6