I am trying to run bcftools (v1.9) mpileup on 60 BAM files from WGS (~20x).
bcftools mpileup -Ou -f ref.fa $(echo $BAM_LIST) | bcftools call -vmO z -o out.vcf.gz
The job started 20 days ago. The out.vcf.gz shows that it has gone through 10 chr (~2ds/chr).
There are still 9 autosomes, 2 sex chrs, MT chr and 44 scaffolds to go through, which I estimate will take another 20 days to complete.
The log shows only the following (not updated since day of submission):
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[mpileup] 60 samples in 60 input files
However, the out.vcf.gz
is being written on daily.
I am looking for a way to make this process to go faster, say, within 2 weeks instead of 40 days.
I found a related solution here but it's for one bam at a time and I would like to call variants on all 60 samples at once.
Any help will be appreciated!
I created a test tab delimited file
regions.txt
for chr1 and chr2 that looks as follows:Column 1 corresponds to CHROM, column2 to POS and column3 to POS_TO. According to the corresponding documentation:
For POS_TO I used the length of the chromosome.
Then I call
bcftools
and it is now running:Could you please tell me if the
regions.txt
was properly constructed in order to span fully each chromosome?Thanks!
no
a bed is 0-based and the start position should start with '0', not '1'
furthermore the idea is to launch 'X' jobs for one chromosome in parallel , not to run one job for 'X' chromosomes.
As I understand it, in order to be cosidered a bed file, the bed extension should be specified:
Otherwise, the
--regions-file
is treated as tab-delimited file by default:In which case positions should start with '1':
As for your second point, I am not really sure what you suggest. Do you mean I should provide one chromosome at a time and run the job in parallel?
yes