Question: Cannot merge BCF files with `bcftools` files because "Index required, expected .vcf.gz or .bcf file" ?
0
gravatar for jespinoz
7 months ago by
jespinoz10
jespinoz10 wrote:

I can't merge my BCF files together using bcftools. Below are the details of my pipeline. After running the pipeline, I created a subdirectory that has 2 *.bcf files to try and merge them as a test set but it's not working.

My commands to merge 2 *.bcf files

 # Directory contents
-bash-4.1$ cd bcf_files/testing/
-bash-4.1$ ls
S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf   

# Attempting to merge 2 bcf files
-bash-4.1$ bcftools view S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf > testing.merged.bcf

#Error below
Index required, expected .vcf.gz or .bcf file: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
Failed to open or the file not indexed: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

I tried indexing them

$ bcftools index S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
[E::main_vcfindex] bcf_index_build failed for S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

My pipeline: I have 88 samples whose reads together total to about 746 G in size.

I used HISAT2 for the mapping using human assembly hg38. HISAT2 supplies preindexed files that we used located at ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz

The assembly for the genome used for the indexing was retrieved from ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Create the sam file

hisat2 -q -p 2 --fast -x ./grch38/genome -1 {r1_path} -2 {r2_path} -S ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam

Sam => Sorted-bam

samtools view -bS ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam | samtools sort -@ 16 -o ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

Sorted-bam => BCF samtools mpileup -uf ./grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -C 50 --BCF -o ./bcf_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

$ du -sh *
5.8T    bcf_files
8.5G    grch38
4.7G    grch38.tar.gz
435K    reads
34K run_tmp.sh
176G    sam_files
38G sorted_bam_files
index bcf snps merge vcf • 420 views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 7 months ago by jespinoz10
1
gravatar for jespinoz
6 months ago by
jespinoz10
jespinoz10 wrote:

The bcf files weren't generated correctly for some reason so I converted to vcf w/ bcftools view then bgzip the file, then indexed the file with bcftools index.

bcftools view ./bcf_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf | bgzip -c > ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz; bcftools index ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz

ADD COMMENTlink written 6 months ago by jespinoz10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour