Using samtools to find mutation
1
0
Entering edit mode
4.2 years ago

i am very new to bioinformatics and command line tools in genomics. I want to use Samtools to identify mutations in a genome. I have the tumor genome (as bam file) , the control genome(as bam file) and the reference genome(in fasta format). How can I use all the three to find mutation in the genome and get information like location, genotype of the mutation? Thanks in advance.

alignment • 3.3k views
ADD COMMENT
2
Entering edit mode
4.2 years ago
Macspider ★ 3.4k

Samtools has the module you're looking for, it's called mpileup. It wants a reference genome in fasta format and 1 or more mapping results in bam format. It will create a VCF file with the putative variants of the 1 or more bam files provided with respect to the reference genome.

http://www.htslib.org/doc/samtools.html

The VCF file is quite complex as a format, I strongly suggest you to read the format specifications carefully (spend 1 day getting comfy with it).

https://samtools.github.io/hts-specs/VCFv4.2.pdf

After running samtools mpileup the work is not done, you have to call the variants. For that, I would suggest you to use bcftools call. After calling, you can use the many values reported in the VCF file to filter the ones you really need.

https://samtools.github.io/bcftools/bcftools.html

ADD COMMENT
0
Entering edit mode

Thanks a lot for your reply. Since yesterday, I have been trying to use mpileup for multiple sorted bam files and the refernce sequence in FASTA format.. But its not helping. Can you point me to some relevant articles/post that would do the job quickly?

ADD REPLY
0
Entering edit mode

its not helping

Since it's probably the gold standard in the variant calling pipelines I highly doubt that it doesn't help... It does exactly what you need if you combine it with the other steps I mentioned. Can you post the command you ran?

ADD REPLY
0
Entering edit mode

I wrote this :- samtools mpileup -f mm9.fa -g *.bam | bcftools call -m - > l100_n1000_d300_31_1.vcf , where mm9.fa is the reference genome and *. bam indicates all the bam files (both control and tumor) . I am getting an error :- [E::bgzf_read] bgzf_read_block error -1 after 356 of 459 bytes I must get the vcf file first and then I can call the variants right? Please help.

ADD REPLY

Login before adding your answer.

Traffic: 856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6