Hi guys,
I am working with SNP analysis for the first time so I would be grateful if someone could look through my pipeline and tell me whether it is correct or not.
My project is based on identification of genetic differences between two groups of mice (two mice in each group). I had fastq files for each mouse. First I used bowtie2 to align them to the reference and then sorted my bam files with samtools. Now I want to call SNPs and Indels using samtools. Is this correct?
samtools mpileup -f genome.fa -o output.raw.bcf file1.bam file2.bam file3.bam file4.bam
bcftools call --multiallelic-caller --variants-only -O v output.raw.bcf -o output.vcf
I want to find differences between two my groups then. As far as I understand, the easiesе way is to transform output.vcf to genotype matrix, like it is described here Difference between a VCF file and a "genotype matrix" ? and then write a script and do whatever I want with this kind of data, am I right?
Thank you in advance!