Fasta with Common SNPs masked
2
0
Entering edit mode
9.8 years ago
robert • 0

How can I mask a sequence with SNPs depending on MAF? The sequence I am interested in is human build 37 and I'd like to mask SNPs that have frequencies of >1% or >5% in dbSNP. Is there some resource out there with common SNPs already masked?

SNP fasta • 3.8k views
ADD COMMENT
1
Entering edit mode
9.8 years ago

get a BED file of the SNPs you want to discard http://genome.ucsc.edu/cgi-bin/hgTables?command=start group:variation All_Snp138 , filter->create->avHet

then use maskfasta to mask the reference: http://bedtools.readthedocs.org/en/latest/content/tools/maskfasta.html

ADD COMMENT
0
Entering edit mode

Thanks, Pierre. That seems like it would work but I can't find any documentation on what the "avHet" filter is. Is that average heterozygosity?

ADD REPLY
0
Entering edit mode
yes click on "table description" : "Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters."
ADD REPLY
0
Entering edit mode
9.8 years ago

I have never used this tool but it seems useful for what you want to achieve.

http://genomecomb.sourceforge.net/docs/cg_genome_seq.html

(This command returns the sequences of the genomic regions given in the file region file in fasta format (to stdout or to a file outfile). Regionfile is a tab delimited file with at least following columns: chromosome begin end. Repeatmasker repeats are soft masked (lower case) in the output sequences. Optionally you can hardmask repeats, and soft or hardmask known (dbsnp) variants based on frequency.)

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/snp138Mask/ (Already masked reference fasta based on dbSNP)

ADD COMMENT
0
Entering edit mode

Thanks, Ashutosh. I'll take a look at this! I was originally using the fastas from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/snp138Mask/ but it included too many SNPs. I only want to mask the high-frequency SNPs and preferably only the SNPs which are high-frequency in Asian populations.

ADD REPLY

Login before adding your answer.

Traffic: 2862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6