Question: non-SNP variants in a vcf merge file
0
gravatar for evelyn
4 weeks ago by
evelyn40
evelyn40 wrote:

Hello,

I have used bwa for alignment and platypus for variant calling and filtered only SNPs using bcftools. I then merged the vcf files with only SNPs for some samples. But the merged file shows some odd positions with variants other than SNPs. Although individual vcf files does not show these calls. I used this code:

bwa mem ref.fa S_1.fastqS_2.fastq | samtools sort -o sorted.bam
samtools index sorted.bam
python Platypus.py callVariants --bamFiles=sorted.bam --refFile=ref.fa  --output=1.vcf
bcftools view --types snps 1.vcf -o 2.vcf
bgzip -c 2.vcf > 2.vcf.gz
tabix -p vcf 2.vcf.gz
bcftools merge --file-list list.txt -O v -o merge.vcf

I have pasted one weird call from the merged file:

Chr POS ID REF ALT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    TGCTCC  NA  TGCTCC
snp • 164 views
ADD COMMENTlink modified 28 days ago by Kevin Blighe51k • written 4 weeks ago by evelyn40

Hello evelyn ,

what's the content of list.txt? Also your output example doesn't look like a valid vcf file. I'm missing the QUAL, FILTER, INFO and FORMAT columns.

fin swimmer

ADD REPLYlink written 4 weeks ago by finswimmer12k

Hello @finswimmer,

list.txt contains list of vcf.gz samples to be merged. In order to show the problem only, I mistakenly broke the format. And I have more samples, I am just showing three here including the problem.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT 1 2 3 
ch1 1470    .   CGCTCC  TGCTCC,TACTCC,TGCTCA    126 badReads;MQ;HapScore    BRF=0.75;FR=0.502;HP=1;HapScore=2;MGOF=157;MMLQ=31;MQ=31.8;NF=3;NR=0;PP=55;QD=29.9237;SC=GATTGACAACCGCTCCAGTTT;SbPval=1;Source=Platypus;TC=3;TCF=3;TCR=0;TR=3;WE=1478;WS=1460   GT:GL:GOF:GQ:NR:NV TGCTCC  NA  TGCTCC
ADD REPLYlink written 4 weeks ago by evelyn40
0
gravatar for Kevin Blighe
28 days ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Some variant callers just call variants in this way. You can probably mitigate these by 'normalising' your data to a reference:

#1st pipe, splits multi-allelic calls into separate variant calls
#2nd pipe, left-aligns indels and issues warnings when the REF base in your VCF does not match the base in the supplied FASTA reference genome
bcftools norm -m-any merge.vcf | \
  bcftools norm -Ob --check-ref w -f human_g1k_v37.fasta > merge_norm.vcf ;

Here, human_g1k_v37.fasta, would be whichever reference genome FASTA against which your variants were originally called.

Kevin

ADD COMMENTlink modified 28 days ago • written 28 days ago by Kevin Blighe51k

Thank you, @Kevin! I will try this.

ADD REPLYlink written 28 days ago by evelyn40

Please let me know if it works....

ADD REPLYlink written 28 days ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour