Snp Identification
3
0
Entering edit mode
10.6 years ago
sara ▴ 40

Hi,

I am new to this field & i need to identify SNPs. So i tried to align my seq to reference using bwa and then used samtool to call variant. now am having vcf files as my output. i viewed the vcf file through IGV browser, from viewer i came to know it contains so many noisy data. can anybody help me to do this work further.

chip-seq data analysis snp • 2.6k views
ADD COMMENT
1
Entering edit mode
10.6 years ago
rob234king ▴ 610

What species are you working with? If you install snpEff and there is an annotation present then you can annotate the vcf files and view the resulting report which gives you some graphs and tables to help review your SNP data. You can then use the companion program snpsff to filter based upon quality scores you establish from the report or if you want only homozygous snps etc.

ADD COMMENT
0
Entering edit mode
10.6 years ago
alexej.knaus ▴ 130

If you are working with human samples, you could try and upload your VCF files to GeneTalk Analyze Human Sequence Variants. Your file will be annotated in the background during preprocessing and you can then filter it for effects on protein level, mode of inheritance, genotype frequency, annotations existing in the database (dbSNP, HGMD, 1kGP...) Each time you filter a file, a new file with reduced variants and the filtering settings in the header is generated. If you want (and if you have the sharing consent) you can share the data with a colleage that is registered at GeneTalk and collaborate together. However, the data is only stored in your account and only you have acces to it. +

Up to now the platform GeneTalk is freely accessible.

ADD COMMENT
0
Entering edit mode
10.6 years ago
always_learning ★ 1.1k

What species you are working with ? My suggestion is to call variant with GATK and with extended option like Recalcibration and ReAlignments etc. Thieu are always a probability to get false positive result because of several background processing errors. We can not remove those errors completely but we can in a lesser proportions. Later you can put across the suggestion given by Alexej and Rob.

http://www.broadinstitute.org/gatk/guide/topic?name=intro

ADD COMMENT
0
Entering edit mode

The use of a realigner is recommended before samtools or GATK to reduce false positives but GATK snp calling and base recalibartion is dependent upon a known accurate SNP file so if this is not available (there is a limit at the moment on the species that are available) then maybe consider whether to use this although it is the gold standard for human and files available for this. I believe you can create a known file by filtering a samtools snp vcf file of snps of a high quality i.e 214+ and use this as a known file in GATK if known file available.

ADD REPLY
0
Entering edit mode

Species is arabidopsis, plant genome

ADD REPLY

Login before adding your answer.

Traffic: 2753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6