Question: Variant calling and downstream analysis
gravatar for arash.iranzadeh1980
3.0 years ago by
arash.iranzadeh198030 wrote:


I have whole genome sequencing data of 1605 bacterial samples in paired-end reads fastq format. These samples come from 3 phenotypes. I have also a reference sequence fasta file. I have aligned 1600 samples to the reference genome and then I passed bam files to samtools mpileup:

samtools mpileup -g -f "ref.fa" -o mpileup.bcf -b <list of="" all="" 1605="" bam="" files="">

Then I call variants using this command: bcftools call -v -m --ploidy 1 -O b -f gq,gp -o variant.bcf mpileup.bcf

Now I have a huge bcf file containing 1605 samples and 312924 variant sites. How I can find variants that are specific to each phenotype? There are three phenotypes across 1605 samples.

snp next-gen • 1.1k views
ADD COMMENTlink modified 3.0 years ago by Brian Bushnell17k • written 3.0 years ago by arash.iranzadeh198030
gravatar for Brian Bushnell
3.0 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Once you have VCF files, you can use BBMap's like this: in=a.vcf,b.vcf,c.vcf out=a_minus_bc.vcf subtract

That will give you variants unique to A.

ADD COMMENTlink written 3.0 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour