Question: non-SNP variants in a vcf merge file
0
gravatar for evelyn
10 months ago by
evelyn100
evelyn100 wrote:

Hello,

I have used bwa for alignment and platypus for variant calling and filtered only SNPs using bcftools. I then merged the vcf files with only SNPs for some samples. But the merged file shows some odd positions with variants other than SNPs. Although individual vcf files does not show these calls. I used this code:

bwa mem ref.fa S_1.fastqS_2.fastq | samtools sort -o sorted.bam
samtools index sorted.bam
python Platypus.py callVariants --bamFiles=sorted.bam --refFile=ref.fa  --output=1.vcf
bcftools view --types snps 1.vcf -o 2.vcf
bgzip -c 2.vcf > 2.vcf.gz
tabix -p vcf 2.vcf.gz
bcftools merge --file-list list.txt -O v -o merge.vcf

I have pasted one weird call from the merged file:

Chr POS ID REF ALT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    TGCTCC  NA  TGCTCC
snp • 360 views
ADD COMMENTlink modified 10 months ago by Kevin Blighe63k • written 10 months ago by evelyn100

Hello evelyn ,

what's the content of list.txt? Also your output example doesn't look like a valid vcf file. I'm missing the QUAL, FILTER, INFO and FORMAT columns.

fin swimmer

ADD REPLYlink written 10 months ago by finswimmer13k

Hello @finswimmer,

list.txt contains list of vcf.gz samples to be merged. In order to show the problem only, I mistakenly broke the format. And I have more samples, I am just showing three here including the problem.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    126 badReads;MQ;HapScore    BRF=0.75;FR=0.502;HP=1;HapScore=2;MGOF=157;MMLQ=31;MQ=31.8;NF=3;NR=0;PP=55;QD=29.9237;SC=GATTGACAACCGCTCCAGTTT;SbPval=1;Source=Platypus;TC=3;TCF=3;TCR=0;TR=3;WE=1478;WS=1460   GT:GL:GOF:GQ:NR:NV TGCTCC  NA  TGCTCC
ADD REPLYlink written 10 months ago by evelyn100
0
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Some variant callers just call variants in this way. You can probably mitigate these by 'normalising' your data to a reference:

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
bcftools norm -m-any merge.vcf | \
  bcftools norm -Ob --check-ref w -f human_g1k_v37.fasta > merge_norm.vcf ;

Here, human_g1k_v37.fasta, would be whichever reference genome FASTA against which your variants were originally called.

Kevin

ADD COMMENTlink modified 10 months ago • written 10 months ago by Kevin Blighe63k

Thank you, @Kevin! I will try this.

ADD REPLYlink written 10 months ago by evelyn100

Please let me know if it works....

ADD REPLYlink written 10 months ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1791 users visited in the last hour