Question: Error in bcftools view -bvcg command
0
gravatar for hafiz.talhamalik
6 months ago by
Pakistan
hafiz.talhamalik70 wrote:

I am using this pipeline to make a vcf file. but m having problem at step 4. bcftools view -bvcg my-raw.bcf > my-var.bcf

it says -b is not a valid parameter.??? what to do ?

  1. Convert SAM to BAM for sorting

    samtools view -S -b my.sam > my.bam

  2. Sort BAM for SNP calling

    samtools sort my.bam my-sorted

  3. Run 'mpileup' to generate VCF format

    samtools mpileup -g -f my.fasta my-sorted-1.bam my-sorted-2.bam my-sorted-n.bam > myraw.bcf

  4. Call SNPs...

    bcftools view my.var.bcf | vcfutils.pl varFilter - > my.var-final.vcf

snp bcftools samtools vcf • 461 views
ADD COMMENTlink modified 6 months ago by WouterDeCoster39k • written 6 months ago by hafiz.talhamalik70

Step 1: check if you have a recent bcftools version

ADD REPLYlink written 6 months ago by WouterDeCoster39k
1

Step 2: if it is the recent version, check if the parameter used are still valid by typing bcftools view --help

It looks like hafiz.talhamalik is following an outdated tutorial in variant calling using samtools/bcftools. In the recent versions this looks like this:

$ bcftools mpileup -Ou -f my.fasta my-sorted-1.bam my-sorted-2.bam my-sorted-n.bam  | bcftools call -mv -Ov > variants.vcf

fin swimmer

ADD REPLYlink modified 6 months ago • written 6 months ago by finswimmer11k

yes my tutorial is old but the command you gave is giving an empty vcf file.

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

Hello again,

what do you mean by "empty". Just the vcf header or really a blank file?

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer11k

completely blank one. no header, nothing

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

Have you tried running them separately??

bcftools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF --targets DP,AD -f ref.fasta  input.sorted.bam  | bcftools call --multiallelic-caller --variants-only  >out.vcf

set the --ploidy if required in the variant calling step.

ADD REPLYlink modified 6 months ago • written 6 months ago by arup1.4k

yes i tried them separately but still empty file.

ADD REPLYlink written 6 months ago by hafiz.talhamalik70
1

Hi, have you checked the sorted bam file? The command you mentioned for sorting is not correct. You can either redirect the ouput using > or can use the -o option. Also index the refernece fasta file with samtools in the same directory.

samtools sort my.bam > my-sorted.bam
samtools sort my.bam -o my-sorted.bam
ADD REPLYlink modified 6 months ago • written 6 months ago by arup1.4k

yeah I tried that. sorted bam file is fine.

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

Could you please post the output of these commands:

  • $ samtools --version
  • $ bcftools --version
  • $ samtools view -H my-sorted.bam
  • $ samtools flagstat my-sorted.bam
  • $ cat ref.fasta.fai

fin swimmer

ADD REPLYlink modified 6 months ago • written 6 months ago by finswimmer11k
samtools --version
samtools 1.8
Using htslib 1.8

bcftools --version
bcftools 1.8
Using htslib 1.8

samtools view -H my-sorted.bam
@HD     VN:1.3  SO:coordinate
@SQ     SN:NC_003198.1  LN:4809037
@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -M -t 8 ../../../SPAdes-3.11.1-Linux//bin/spades_output/sequence.fasta GCF_900185485.1_BL60006_cds_from_genomic.fna

samtools flagstats my-sorted.bam
[main] unrecognized command 'flagstats'

cat ref.fasta.fai
NC_003198.1     4809037 90      70      71
ADD REPLYlink modified 6 months ago by finswimmer11k • written 6 months ago by hafiz.talhamalik70
1
@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -M -t 8 ../../../SPAdes-3.11.1-Linux//bin/spades_output/sequence.fasta GCF_900185485.1_BL60006_cds_from_genomic.fna

This looks like you're trying to align a fasta sequence and not fastq. AFAIK bwa can do this. But I'm not sure if bcftools can call variants from this file, because the base quality values are missing and if sequence.fasta contains only one sequence you only have one read per position in the alignment file.

So could you please tell us more about what you are trying to do?

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer11k

I have downloaded all the fasta sequence of salmonella from ncbi and now i want to compare snp's from those fasta files with my sequenced data's vcf file.

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

So you have downloaded assembled genomes rather than separate reads?

You can do variant calling from those but it would have been absolutely important if you mention things like this in your first post.

ADD REPLYlink written 6 months ago by WouterDeCoster39k

yes assembled genomes.

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

You have wasted a lot of your and our time by not telling us that earlier. I have added an answer to this thread.

ADD REPLYlink written 6 months ago by WouterDeCoster39k

oho. M really sorry for that..!

ADD REPLYlink written 6 months ago by hafiz.talhamalik70

argh, it should be samtools flagstat (without s). I've edited it.

ADD REPLYlink written 6 months ago by finswimmer11k
samtools flagstat
4908 + 0 in total (QC-passed reads + QC-failed reads)
4 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
4761 + 0 mapped (97.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
ADD REPLYlink modified 6 months ago by finswimmer11k • written 6 months ago by hafiz.talhamalik70
5
gravatar for WouterDeCoster
6 months ago by
Belgium
WouterDeCoster39k wrote:

For calling variants from assembled genomes, relative to a reference, you need an entirely different approach.

Take for example a look here: https://github.com/lh3/minimap2/blob/master/cookbook.md#asm-var Using minimap2 for alignment and paftools for variant calling.

ADD COMMENTlink written 6 months ago by WouterDeCoster39k

oh okz.. Thanks for this. let me go through this. so in a nutshell you are suggesting samtool's aprroach is not good for this kind of analysis ?

ADD REPLYlink written 6 months ago by hafiz.talhamalik70
2

You are aligning an assembly, so the aligner you used before is probably not the best for this job. Furthermore, you will have a coverage of 1 on most locations in the genome, which is of course not enough to call variants using pileup as you would for fastq reads.

ADD REPLYlink written 6 months ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1390 users visited in the last hour