Problems in Variant Calling Code
1
0
Entering edit mode
4.1 years ago

I am not able to understand this code snippet I have, to carry out variant calling, for the SNP Analysis. The snippet is:

bcftools mpileup -Ou -f Ref_1_20040804.fa ref_1.sorted.bam | bcftools call -mv -Ob -o ref_1_btool.bcf

bcftools view ref_1_btool.bcf > pile.vcf

bcftools call -cv ref_1_btool.bcf > variant.bcf

gatk SelectVariants -R ref_1 -V variant.bcf -select-type SNP -O ref_1_snps.vcf**

Here, Ref_1_20040804.fa is the Reference Sequence. The code refers to a pileup.bcf file which I do not have. Any advice on how to proceed would be appreciated.

SNP Variant Calling • 1.2k views
ADD COMMENT
1
Entering edit mode
4.1 years ago

Hello. You wrote this code or you copied it from somewhere? If you just want single nucleotide variants, then you just need to do this:

bcftools mpileup \
  --redo-BAQ \
  --min-BQ 30 \
  --per-sample-mF \
  --annotate FORMAT/AD,FORMAT/ADF,FORMAT/ADR,FORMAT/DP,FORMAT/SP,INFO/AD,INFO/ADF,INFO/ADR \
  -f Ref_1_20040804.fa ref_1.sorted.bam | \
  bcftools call \
    --multiallelic-caller \
    --variants-only \
    --skip-variants indels \
    -Ov > ref_1_snps.vcf ;

Please ask further questions if anything is unclear.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin. Thank you for the clarification, This makes more sense to me now.

I had picked up the code as I am quite new to this field.

I have another question regarding SNP Analysis. How would you carry out the Base Recalibration and Apply BQSR step for a Bacterial genome.

I am sharing the code I have found, but it deals with Human genome data.

#Indexing SNP Feature File

./gatk IndexFeatureFile -F dbsnp_138.hg19.excluding_sites_after_129.vcf
gatk IndexFeatureFile -F 1000G_phase1.indels.hg19.vcf 
gatk IndexFeatureFile -F Mills_and_1000G_gold_standard.indels.hg19.vcf 

#Base Recalibration 
gatk BaseRecalibrator -I ERR445551_mkdp.bam -O ERR445551_bqsr.table -R genome.fa /
    --known-sites dbsnp_138.hg19.excluding_sites_after_129.vcf --known-sites 1000G_phase1.indels.hg19.vcf /
    --known-sites Mills_and_1000G_gold_standard.indels.hg19.vcf

#Apply BQSR
gatk ApplyBQSR -I ERR445551_rmdp.bam -O ERR445551_recal.bam -R genome.fa /
    --bqsr-recal-file ERR445551_bqsr.table

#Build Recalibration Model
gatk BaseRecalibrator -I ERR445551_recal.bam -O ERR445551_postbqsr.table -R genome.fa  /
    --known-sites dbsnp_138.hg19.excluding_sites_after_129.vcf /
    --known-sites 1000G_phase1.indels.hg19.vcf /
     --known-sites Mills_and_1000G_gold_standard.indels.hg19.vcf

Any advice on how to proceed further would be greatly appreciated.

Thanks!

ADD REPLY
1
Entering edit mode

GATK is not optimal for bacteria IMO. GATK is built and maintained around working with human genomes, so you're better off using other tools. There might be assemblers that work really well with bacteria (they have circular DNA, right?)

ADD REPLY
0
Entering edit mode

Hi, for that, I may direct you to the GATK forum, unless any of my colleagues want to answer. I have not used GATK for a long time.

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6