Question: Cannot merge BCF files with `bcftools` files because "Index required, expected .vcf.gz or .bcf file" ?
7 months ago
jespinoz10 wrote:

I can't merge my BCF files together using bcftools. Below are the details of my pipeline. After running the pipeline, I created a subdirectory that has 2 *.bcf files to try and merge them as a test set but it's not working.

My commands to merge 2 *.bcf files

 # Directory contents
-bash-4.1$ cd bcf_files/testing/
-bash-4.1$ ls
S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf   

# Attempting to merge 2 bcf files
-bash-4.1$ bcftools view S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf > testing.merged.bcf

#Error below
Index required, expected .vcf.gz or .bcf file: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
Failed to open or the file not indexed: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

I tried indexing them

$ bcftools index S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
[E::main_vcfindex] bcf_index_build failed for S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

My pipeline: I have 88 samples whose reads together total to about 746 G in size.

I used HISAT2 for the mapping using human assembly hg38. HISAT2 supplies preindexed files that we used located at

The assembly for the genome used for the indexing was retrieved from

Create the sam file

hisat2 -q -p 2 --fast -x ./grch38/genome -1 {r1_path} -2 {r2_path} -S ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam

Sam => Sorted-bam

samtools view -bS ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam | samtools sort -@ 16 -o ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

Sorted-bam => BCF samtools mpileup -uf ./grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -C 50 --BCF -o ./bcf_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

$ du -sh *
5.8T    bcf_files
8.5G    grch38
4.7G    grch38.tar.gz
435K    reads
176G    sam_files
38G sorted_bam_files
6 months ago
jespinoz10 wrote:

The bcf files weren't generated correctly for some reason so I converted to vcf w/ bcftools view then bgzip the file, then indexed the file with bcftools index.

bcftools view ./bcf_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf | bgzip -c > ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz; bcftools index ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz

