Question: Variant calling and downstream analysis
gravatar for arash.iranzadeh1980
21 months ago by
arash.iranzadeh198010 wrote:


I have whole genome sequencing data of 1605 bacterial samples in paired-end reads fastq format. These samples come from 3 phenotypes. I have also a reference sequence fasta file. I have aligned 1600 samples to the reference genome and then I passed bam files to samtools mpileup:

samtools mpileup -g -f "ref.fa" -o mpileup.bcf -b <list of="" all="" 1605="" bam="" files="">

Then I call variants using this command: bcftools call -v -m --ploidy 1 -O b -f gq,gp -o variant.bcf mpileup.bcf

Now I have a huge bcf file containing 1605 samples and 312924 variant sites. How I can find variants that are specific to each phenotype? There are three phenotypes across 1605 samples.

snp next-gen • 782 views
ADD COMMENTlink modified 21 months ago by Brian Bushnell16k • written 21 months ago by arash.iranzadeh198010
gravatar for Brian Bushnell
21 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Once you have VCF files, you can use BBMap's like this: in=a.vcf,b.vcf,c.vcf out=a_minus_bc.vcf subtract

That will give you variants unique to A.

ADD COMMENTlink written 21 months ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour