Question: Using samtools to find mutation
0
gravatar for banerjeeshayantan
3.1 years ago by
banerjeeshayantan170 wrote:

i am very new to bioinformatics and command line tools in genomics. I want to use Samtools to identify mutations in a genome. I have the tumor genome (as bam file) , the control genome(as bam file) and the reference genome(in fasta format). How can I use all the three to find mutation in the genome and get information like location, genotype of the mutation? Thanks in advance.

alignment • 2.3k views
ADD COMMENTlink modified 3.1 years ago by Macspider3.0k • written 3.1 years ago by banerjeeshayantan170
2
gravatar for Macspider
3.1 years ago by
Macspider3.0k
Vienna - BOKU
Macspider3.0k wrote:

Samtools has the module you're looking for, it's called mpileup. It wants a reference genome in fasta format and 1 or more mapping results in bam format. It will create a VCF file with the putative variants of the 1 or more bam files provided with respect to the reference genome.

http://www.htslib.org/doc/samtools.html

The VCF file is quite complex as a format, I strongly suggest you to read the format specifications carefully (spend 1 day getting comfy with it).

https://samtools.github.io/hts-specs/VCFv4.2.pdf

After running samtools mpileup the work is not done, you have to call the variants. For that, I would suggest you to use bcftools call. After calling, you can use the many values reported in the VCF file to filter the ones you really need.

https://samtools.github.io/bcftools/bcftools.html

ADD COMMENTlink written 3.1 years ago by Macspider3.0k

Thanks a lot for your reply. Since yesterday, I have been trying to use mpileup for multiple sorted bam files and the refernce sequence in FASTA format.. But its not helping. Can you point me to some relevant articles/post that would do the job quickly?

ADD REPLYlink written 3.1 years ago by banerjeeshayantan170

its not helping

Since it's probably the gold standard in the variant calling pipelines I highly doubt that it doesn't help... It does exactly what you need if you combine it with the other steps I mentioned. Can you post the command you ran?

ADD REPLYlink written 3.1 years ago by Macspider3.0k

I wrote this :- samtools mpileup -f mm9.fa -g *.bam | bcftools call -m - > l100_n1000_d300_31_1.vcf , where mm9.fa is the reference genome and *. bam indicates all the bam files (both control and tumor) . I am getting an error :- [E::bgzf_read] bgzf_read_block error -1 after 356 of 459 bytes I must get the vcf file first and then I can call the variants right? Please help.

ADD REPLYlink written 3.1 years ago by banerjeeshayantan170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour