I have two resources of .bam files. One is generated by our lab (1 sample = 1 bam). One is downloaded online (again 1 sample = 1 bam). For the downloaded samples the chromosomes are labelled: chr1, chr2, chr3 etc For our lab samples, the chromosomes are labelled: 1, 2, 3 etc. I want to generate a single VCF file of variants across all samples.
I'm using bcftools: bcftools mpileup -Ov -f ref.fasta -b samples.txt | bcftools call -mv -o bamMge.vcf
However, I get no calls and the repeated error: [E::faidx_adjust_position] The sequence "1" was not found
I have two questions:
- Is my strategy correct (i.e. bcftools mpileup)? Do I need to incorporate extra steps (NB I also have matching gVCF files)
- How can I either (i) alter the chromosome labeling of one of the subsets of bam files, or, (ii) use a mapping file to match chromosome labels during the mpileup run?
Thanks
I have an associated question. I am creating a vcf from 128 bams. All of the bams were aligned to the same reference genome, but the reference genome and paths are different in some of the bams. Furthermore, some of the bams were created with bwa-mem and others were created with bwa-mem2. Will these slight differences in bam creation affect my vcf?
For example the bam headers are slightly different:
BAM1
BAM2
The contig names of the genome files are the same! Any help would be greatly appreciated.
Ask a new question and maybe reference this post in that question.