Hi everyone! I'm new in bioinformatics and also in informatics so I'm struggling a bit trying to learn on my own.
At the moment I'm having some difficulties trying to generate a fasta file from a bam file of a complete human genome.
I've red on the internet that a way to obtain what I'm trying to get is to follow this command:
bcftools mpileup -Ou -f reference.fa alignments.bam | bcftools call -mv -Oz -o calls.vcf.gz bcftools index calls.vcf.gz
bcftools norm -f reference.fa calls.vcf.gz -Ob -o calls.norm.bcf
filter adjacent indels within 5bp
bcftools filter --IndelGap 5 calls.norm.bcf -Ob -o calls.norm.flt-indels.bcf
apply variants to create consensus sequence
cat reference.fa | bcftools consensus -m lowcoveragesites.bed calls.vcf.gz > consensus.fa
1)Let me start by saying that I find a bit weird that, according to the command above, I have to generate normalized and filtered files (call.norm.bcf, call.norm.filtr-indels.bcf) but never use them to create the consensus. Is there something wrong that the command?
But that being said, I'm following step by step the command. 2)I'm having some issues with the overlapping variants. I constantly receive the error message: "The site 3:60771641 overlaps with another variant, skipping..." for different position in my genome. Is there a way to solution it?
3)Moreover, I was wondering if the bcftools consensus was considering the Heterozygous alleles inserting at the specific position the letters for snp, according to the IUPAC, so I started searching for R,W,Y,... in my consensus genome and I only found 3 heterozygous sites with make me think something went wrong.
Can you help me with my issue? Do you know how I can easily get a consensus fasta from my bam?
Thank you !