Question: Variant Calling (Samtools Mpileup): Error
0
gravatar for venuraherath
10 weeks ago by
venuraherath20
venuraherath20 wrote:

Hi,

I am trying to update a variant calling script came from the predecessors. I am getting the error "bash: mpileup : No such file or directory"

Following are the contents of the old script;

samtools mpileup -uD -f <reference_genome.fna> <input.bam> > <output1.out> [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000^C

bcftools view <output1.out>  > <output2.out>
samtools vcfutils.pl varFilter -d2 -D100 -a2 <output2.out> > output3.vcf

I did modify "samtools mpileup" to "bcftools mpileup" and "-uD to -u -t DP" to match new versions. The modified script is given below;

bcftools mpileup -u -t DP -f <reference_genome.fna> <input.bam> > <output1.out> [mpileup] 1 samples in 1 input files  <mpileup> Set max per-file depth to 8000^C

bcftools view <output1.out>  > <output2.out>
samtools vcfutils.pl varFilter -d2 -D100 -a2 <output2.out> > output3.vcf

When I run it , I get above error. I am using samtools 1.9 & bcftools 1.9

Can someone help me on this regard?

Thank you.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by venuraherath20

Umm, you know you aren't supposed to literally have those <> in your command line, right?

ADD REPLYlink written 10 weeks ago by swbarnes25.0k

Are your referring to <> between output files? This is just the template. I used real file name and didnt include <>. except for <mpileup> Set max per-file depth to 8000^C

ADD REPLYlink written 10 weeks ago by venuraherath20

except for <mpileup> Set max per-file depth to 8000^C

You wrote this in your command line by yourself? That doesn't work. This have to be done with -d 8000 as a parameter for bcftools mpileup.

fin swimmer

ADD REPLYlink written 10 weeks ago by finswimmer11k

If I use the following will it do the trick? Thanks.

bcftools mpileup -u -t DP -f -d 8000 <reference_genome.fna> <input.bam> > <output1.out> 

 [mpileup] 1 samples in 1 input files
ADD REPLYlink written 10 weeks ago by venuraherath20
1

You should first start bcftools mpileup without any parameter to see the help page. You will see:

  • that there is no paramter -u. You probably want to use -Ou to output as uncompressed bcf?
  • that -t is used for defining target regions . You probably want to use -a FORMAT/DP to add this value into the output?
  • that -f must be followed by the reference sequence.

You are aware that bcftools mpileup itself doesn't call variants? For this bcftools call is neccessary. And this needs the uncompressed bcf output from mpileup as input.

fin swimmer

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by finswimmer11k

Oh, got it. Thank you so much.

  • Yes I will use -Ou . Most of the examples available are based on samtools mpileup not bcftools mpileup.
  • Actually what i wanted there was was to keep read depth for each sample. So I will use -a FORMAT/DP
  • I will move -d 8000 before -f

    bcftools mpileup -Ou -a FORMAT/DP -d 8000 -f <reference_genome.fna> <input.bam> > <output1.out>
    [mpileup] 1 samples in 1 input files'

When I try that I get following error;

[E::hts_open_format] Failed to open file [mpileup] [mpileup] failed to open [mpileup]: No such file or directory

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by venuraherath20

Could you please show the real command you use (replacing your placeholder by filenames)?

fin swimmer

ADD REPLYlink written 10 weeks ago by finswimmer11k
bcftools mpileup -Ou -a FORMAT/DP -d 8000 -f GCF_000002315.5_GRCg6a_genomic.fna FCC717NANXX-wHAIPI020726-60.bam > FCC717NANXX-wHAIPI020726-60.out [mpileup] 1 samples in 1 input files
ADD REPLYlink written 10 weeks ago by venuraherath20
1

Remove this part: [mpileup] 1 samples in 1 input files

And use just

bcftools mpileup -Ou -a FORMAT/DP -d 8000 -f GCF_000002315.5_GRCg6a_genomic.fna FCC717NANXX-wHAIPI020726-60.bam > FCC717NANXX-wHAIPI020726-60.out

ADD REPLYlink written 10 weeks ago by finswimmer11k

Thank you so much! It is running now. Regarding bcftools call ; I have received VCF files generated via original workflow (as given in the old script). Now I am confused. Do you think it is better to dump those VCF files and re do those samples as well, (esp since dif version of tools were used)?

Omitting rest ,Is it ok to use bcftools call -v -m -O z -o variantc.vcf.gz followed by filtering variants using rtg vcffilter? Sorry for asking too many questions. Thank you.

ADD REPLYlink written 10 weeks ago by venuraherath20
1

Hello venuraherath ,

you bcftools call looks fine to me. In general I recommend to normalize your indels after calling. You can chain all these command together:

$ bcftools mpileup -Ou -a FORMAT/DP -d 8000 -f GCF_000002315.5_GRCg6a_genomic.fna FCC717NANXX-wHAIPI020726-60.bam | bcftools call -v -m -Ou | bcftools norm -f GCF_000002315.5_GRCg6a_genomic.fna -Oz -o variantc.vcf.gz

I've never used rtg. My favorite is bcftools filter or bcftools view.

Do you think it is better to dump those VCF files and re do those samples as well, (esp since dif version of tools were used)?

This depends on the aim of your investigation.

fin swimmer

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by finswimmer11k

Thanks again. You were very helpful. Earlier script is still running since its using only one thread though 24 threads are available. Do you deposit your script there at Github? If so please give me the link. So I can look at them and learn more.

Have a great day! Venura

ADD REPLYlink written 10 weeks ago by venuraherath20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 718 users visited in the last hour