Hello
I am a beginner in ngs bioinformatic analysis. I try to create a multisampled .vcf file from 250 .bam file using mpileup bcftools. Here the command I use:
bcftools mpileup -O z -o output.vcf.gz -f ref_genome.fasta bam_file_list*.bam
There are two questions I want to ask about this.
I use Linux machine with 32GB of memory, but only 10% of RAM being used by the system. Is there any way to increase the memory usage in the bcftools command line?
Can we show the expected time or progress bar in the bcftools command line? I was tried to insert
bcftools mpileup -O z -o output.vcf.gz -f ref_genome.fasta bam_file_list*.bam | pv -p -t -e but the time and bar didn't visualized the progress.
I hope you can help me to solve my problems.
Thank you
I don't think bcftools mpileup uses a lot of memory
no, but you can always show what's happening
bcftools mpileup -f ref_genome.fasta bam_file_list*.bam | tee /dev/tty | bgzip > output.vcf.gz
@Pierre Lindenbaum
Thank you for the command-line advice, now I can see what's happening during the process.
Not using a lot of RAM is a good thing. In general, (bio)informatic processes are either constrained by:
htop
on your machine to see processor usage)Thanks for your comment @WouterDeCoster
Can we accelerate the process by increasing the RAM usage?
The tool will probably be constrained by one of the other parameters, probably CPU usage. You could launch multiple processes in parallel, for example doing variant calling separately per chromosome.
...and how many BAMs are you actually passing to BCFtools? The use of
bam_file_list*.bam
looks like a risky maneuver to me. You can supply a list of BAMs to BCFTools mpileup with: