Question: Variant calling and downstream analysis
0
gravatar for arash.iranzadeh1980
2.3 years ago by
arash.iranzadeh198030 wrote:

Hi,

I have whole genome sequencing data of 1605 bacterial samples in paired-end reads fastq format. These samples come from 3 phenotypes. I have also a reference sequence fasta file. I have aligned 1600 samples to the reference genome and then I passed bam files to samtools mpileup:

samtools mpileup -g -f "ref.fa" -o mpileup.bcf -b <list of="" all="" 1605="" bam="" files="">

Then I call variants using this command: bcftools call -v -m --ploidy 1 -O b -f gq,gp -o variant.bcf mpileup.bcf

Now I have a huge bcf file containing 1605 samples and 312924 variant sites. How I can find variants that are specific to each phenotype? There are three phenotypes across 1605 samples.

snp next-gen • 967 views
ADD COMMENTlink modified 2.3 years ago by Brian Bushnell16k • written 2.3 years ago by arash.iranzadeh198030
2
gravatar for Brian Bushnell
2.3 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Once you have VCF files, you can use BBMap's comparevcf.sh like this:

comparevcf.sh in=a.vcf,b.vcf,c.vcf out=a_minus_bc.vcf subtract

That will give you variants unique to A.

ADD COMMENTlink written 2.3 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour