Question: Automatic Detection And Annotation Of Bacterial Snps
gravatar for munnitin
5.0 years ago by
munnitin10 wrote:

Hi all,

I am working on 10 bacterial genomes (1 reference and 9 mutant) sequenced by Illumina technology. My main aim is to find SNPs that are common in 9 genomes but absent in reference genomes. In last, I would like to do the automatic annotation of those SNPs. Until now, I have done the following steps and wondering if I am on the right path.

First: Extracted the common SNPs in 9 mutant genomes

vcf-isec -n +9 -f 1.vcf.gz 2.vcf.gz 3.vcf.gz 4.vcf.gz 5.vcf.gz 6.vcf.gz 7.vcf.gz 8.vcf.gz 9.vcf.gz | bgzip -c > isec1.vcf.gz

Second: tab index

tabix -p vcf isec1.vcf.gz

Third: Extracted SNps that are present in isec1.vcf.gz but absent in reference strain

vcf-isec -c -f isec1.vcf.gz reference.vcf.gz > isec2.vcf

Four: Automatic annotation of isec2.vcf

Used snpEFF

java -jar snpEff.jar eff -no-downstream -no-upstream -no-utr -no-intergenic -v database isec2.vcf

Most of the SNPs were observed in intergenic region. Should I include these intergenic SNPs or not? Any other suggestions of selecting SNPs.

Regards Nitin

vcf
written 5.0 years ago by munnitin10

The only comment I would have is verify what happens when the genomic variations are longer than a single base.

How does your intersect command work: does it require that the coordinates and type of variation match exactly or will the condition trigger on any amount of overlap between two variation.

(Also I would update the title, currently it is very generic and thus less helpful. The title should be a short version of the question that you are actually asking)

written 5.0 years ago by Istvan Albert ♦♦ 79k
