Using samtools to find mutation
1
0
Entering edit mode
4.2 years ago

i am very new to bioinformatics and command line tools in genomics. I want to use Samtools to identify mutations in a genome. I have the tumor genome (as bam file) , the control genome(as bam file) and the reference genome(in fasta format). How can I use all the three to find mutation in the genome and get information like location, genotype of the mutation? Thanks in advance.

alignment • 3.3k views
2
Entering edit mode
4.2 years ago
Macspider ★ 3.4k

Samtools has the module you're looking for, it's called mpileup. It wants a reference genome in fasta format and 1 or more mapping results in bam format. It will create a VCF file with the putative variants of the 1 or more bam files provided with respect to the reference genome.

http://www.htslib.org/doc/samtools.html

The VCF file is quite complex as a format, I strongly suggest you to read the format specifications carefully (spend 1 day getting comfy with it).

https://samtools.github.io/hts-specs/VCFv4.2.pdf

After running samtools mpileup the work is not done, you have to call the variants. For that, I would suggest you to use bcftools call. After calling, you can use the many values reported in the VCF file to filter the ones you really need.

https://samtools.github.io/bcftools/bcftools.html

0
Entering edit mode

Thanks a lot for your reply. Since yesterday, I have been trying to use mpileup for multiple sorted bam files and the refernce sequence in FASTA format.. But its not helping. Can you point me to some relevant articles/post that would do the job quickly?

0
Entering edit mode

its not helping

Since it's probably the gold standard in the variant calling pipelines I highly doubt that it doesn't help... It does exactly what you need if you combine it with the other steps I mentioned. Can you post the command you ran?

0
Entering edit mode

I wrote this :- samtools mpileup -f mm9.fa -g *.bam | bcftools call -m - > l100_n1000_d300_31_1.vcf , where mm9.fa is the reference genome and *. bam indicates all the bam files (both control and tumor) . I am getting an error :- [E::bgzf_read] bgzf_read_block error -1 after 356 of 459 bytes I must get the vcf file first and then I can call the variants right? Please help.