How To Filter Against Common Snp Database To Look For Rare Variants?
2
0
Entering edit mode
12.1 years ago
Bioscientist ★ 1.7k

I've been using samtools/GATK to call for SNP/indels these days, and would like to filter my data against known common variants, to achieve rare events.

Some of the samtools results can be:

#CHROM    POS    ID    REF    ALT    
1    10177    .    ACCT    ACCCT

There could be several alternatives for both REF and ALT sequence. So when I compare my SNP/indel with common SNP database; should I just compare the position using bedtools, or should I look into the change of bases? For example, if in database, on position 1, it's a SNP changing from G to C; while my results indicate also at position 1,but changing from G to T. Then should I regard this as common SNP or rare SNP?

Thanks!

snp samtools • 9.5k views
ADD COMMENT
2
Entering edit mode
12.1 years ago
Vikas Bansal ★ 2.4k

First of all, I will annotate my SNPs by checking if they are present in dbSNP (same change). You can use tools for this, may be ANNOVAR and it can perform filtering also. I will not simply exclude if a SNP is present in database because may be that SNP can be important in your study. One way is to exclude SNPs which are present with more than 1% allele frequency in database.

EDIT:

From UCSC, you can find this information from here. It says-

Common SNPs(135) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly.

Flagged SNPs(135) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele!

Mult. SNPs(135) - SNPs mapping in more than one place on reference assembly.

All SNPs(135) - all SNPs from dbSNP mapping to reference assembly.

ADD COMMENT
0
Entering edit mode

so basically you mean we should consider the change of base, rather than simply the position, right? thx

ADD REPLY
0
Entering edit mode

Yes, and also in downstream analysis, what kind of mutation is- missense, nonsense , frameshift etc.

ADD REPLY
0
Entering edit mode

Also, the dbSNP, you mean all SNP, common SNP? There are two different database on UCSC, one "all SNP", one "common SNP"

ADD REPLY
0
Entering edit mode

See my edit. Select accordingly.

ADD REPLY
1
Entering edit mode
12.1 years ago
User 2005 ▴ 70

Here are a few guidelines :

  1. Annotation against reference genome
  2. Detection of non-synonymous SNP
  3. Group SNP by gene of interest (associated with pathology) or gene ontology
  4. if nothing comes out, do a chi-squared for each position found and see what is your treshold p-value
  5. Manhattan plot / QQ Plot

I hope it helped :)

ADD COMMENT

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6