Generate VCF from different .bam files with different chromosome names
Entering edit mode
19 months ago
ctdarwell • 0

I have two resources of .bam files. One is generated by our lab (1 sample = 1 bam). One is downloaded online (again 1 sample = 1 bam). For the downloaded samples the chromosomes are labelled: chr1, chr2, chr3 etc For our lab samples, the chromosomes are labelled: 1, 2, 3 etc. I want to generate a single VCF file of variants across all samples.

I'm using bcftools: bcftools mpileup -Ov -f ref.fasta -b samples.txt | bcftools call -mv -o bamMge.vcf

However, I get no calls and the repeated error: [E::faidx_adjust_position] The sequence "1" was not found

I have two questions:

  1. Is my strategy correct (i.e. bcftools mpileup)? Do I need to incorporate extra steps (NB I also have matching gVCF files)
  2. How can I either (i) alter the chromosome labeling of one of the subsets of bam files, or, (ii) use a mapping file to match chromosome labels during the mpileup run?


bcftools mpileup vcf • 755 views
Entering edit mode
19 months ago

The contig names in your BAM files should match those in your file, ref.fasta; so, you need to solve this issue. The question that worries me is this: were the BAMs produced by aligning reads to different reference genomes? If this is the case, which seems to be the likely scenario, you should re-do the alignment to ensure that the data is aligned to the same genome.

Alternatively, try to determine the respective genomes to which your BAMs were aligned, and use these respective genomes for bcftools mpileup. You can almost certainly find out the reference genome used by simply looking at the information in the BAM headers, via:

bcftools view -h my.bam



Login before adding your answer.

Traffic: 1428 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6