Generate VCF from different .bam files with different chromosome names
19 months ago
ctdarwell • 0

I have two resources of .bam files. One is generated by our lab (1 sample = 1 bam). One is downloaded online (again 1 sample = 1 bam). For the downloaded samples the chromosomes are labelled: chr1, chr2, chr3 etc For our lab samples, the chromosomes are labelled: 1, 2, 3 etc. I want to generate a single VCF file of variants across all samples.

I'm using bcftools: bcftools mpileup -Ov -f ref.fasta -b samples.txt | bcftools call -mv -o bamMge.vcf

I have two questions:

1. Is my strategy correct (i.e. bcftools mpileup)? Do I need to incorporate extra steps (NB I also have matching gVCF files)
2. How can I either (i) alter the chromosome labeling of one of the subsets of bam files, or, (ii) use a mapping file to match chromosome labels during the mpileup run?

Thanks

19 months ago

The contig names in your BAM files should match those in your file, ref.fasta; so, you need to solve this issue. The question that worries me is this: were the BAMs produced by aligning reads to different reference genomes? If this is the case, which seems to be the likely scenario, you should re-do the alignment to ensure that the data is aligned to the same genome.

Alternatively, try to determine the respective genomes to which your BAMs were aligned, and use these respective genomes for bcftools mpileup. You can almost certainly find out the reference genome used by simply looking at the information in the BAM headers, via:

bcftools view -h my.bam


Kevin