Question: How To Filter Against Common Snp Database To Look For Rare Variants?
0
gravatar for Bioscientist
7.4 years ago by
Bioscientist1.7k
Bioscientist1.7k wrote:

I've been using samtools/GATK to call for SNP/indels these days, and would like to filter my data against known common variants, to achieve rare events.

Some of the samtools results can be:

#CHROM    POS    ID    REF    ALT    
1    10177    .    ACCT    ACCCT

There could be several alternatives for both REF and ALT sequence. So when I compare my SNP/indel with common SNP database; should I just compare the position using bedtools, or should I look into the change of bases? For example, if in database, on position 1, it's a SNP changing from G to C; while my results indicate also at position 1,but changing from G to T. Then should I regard this as common SNP or rare SNP?

Thanks!

samtools snp • 7.3k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 7.4 years ago by Bioscientist1.7k
2
gravatar for Vikas Bansal
7.4 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

First of all, I will annotate my SNPs by checking if they are present in dbSNP (same change). You can use tools for this, may be ANNOVAR and it can perform filtering also. I will not simply exclude if a SNP is present in database because may be that SNP can be important in your study. One way is to exclude SNPs which are present with more than 1% allele frequency in database.

EDIT:

From UCSC, you can find this information from here. It says-

Common SNPs(135) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly.

Flagged SNPs(135) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele!

Mult. SNPs(135) - SNPs mapping in more than one place on reference assembly.

All SNPs(135) - all SNPs from dbSNP mapping to reference assembly.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Vikas Bansal2.3k

so basically you mean we should consider the change of base, rather than simply the position, right? thx

ADD REPLYlink written 7.4 years ago by Bioscientist1.7k

Yes, and also in downstream analysis, what kind of mutation is- missense, nonsense , frameshift etc.

ADD REPLYlink written 7.4 years ago by Vikas Bansal2.3k

Also, the dbSNP, you mean all SNP, common SNP? There are two different database on UCSC, one "all SNP", one "common SNP"

ADD REPLYlink written 7.4 years ago by Bioscientist1.7k

See my edit. Select accordingly.

ADD REPLYlink written 7.4 years ago by Vikas Bansal2.3k
1
gravatar for User 2005
7.4 years ago by
User 200570
User 200570 wrote:

Here are a few guidelines :

  1. Annotation against reference genome
  2. Detection of non-synonymous SNP
  3. Group SNP by gene of interest (associated with pathology) or gene ontology
  4. if nothing comes out, do a chi-squared for each position found and see what is your treshold p-value
  5. Manhattan plot / QQ Plot

I hope it helped :)

ADD COMMENTlink written 7.4 years ago by User 200570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1324 users visited in the last hour