Calculating variant frequency in R
4
0
Entering edit mode
6.0 years ago
nkabo ▴ 60

Hi,

I have approximately 1200 variants for 390 genes in an excel document and I want to find frequency of each variant and add it to the excel file. Is there a way to make it in R? Thanks in advance.

SNP R • 3.0k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
Fabio Marroni ★ 2.9k

I assume you know how to compute variant frequency, and just need to learn how to read and write excel files in R. You might give a look at this post on Stack Overflow.

ADD COMMENT
1
Entering edit mode
6.0 years ago

If you are not familiar with R, you can simply get the allele frequency in the 1000Genomes and more information by simply pasting the list of variants in the Ensembl Variant Effect Tool: http://www.ensembl.org/Tools/VEP

ADD COMMENT
0
Entering edit mode

Thank you, I will also use that one.

ADD REPLY
0
Entering edit mode
6.0 years ago

I've been through the pain of trying to read and write spreadsheets in R, and highly recommend openxlsx (CRAN), as the most stable that I've tried. As for your allele frequency calculation, that depends on the dataset you have at hand, if you are stuck with that, then please amend your post with more detail about the dataset you have.

ADD COMMENT
0
Entering edit mode
6.0 years ago
nkabo ▴ 60

Thank you for your help, I have the gene name, start and end positions, type of substitution(A>G for example) and rs_id in each line in excel. Is there a way in R to find the expected disease allele frequency automatically for the specified gene and write it on the last column and also calculate the frequency of the specified variant for the gene and compare them? There are tools that find the variant frequency when you add the variant and the gene one by one but I want to make it read whole file at once and give me the variant frequency for all populations. Does R have a package for it or if I cannot do it in R, could you please suggest me another program? Thanks.

ADD COMMENT
0
Entering edit mode

Please use the "Add Comment" button in future, or amend your post with more details.

The short answer is yes. You can use openxlsx to read in your spreadsheets, then use the biomaRt package (Bioconductor) to query for the 1000G MAF (for example), based on dbSNP IDs. Alternatively, there's a web interface for Ensembl 's BioMart service

ADD REPLY
0
Entering edit mode

Thank you, it helped :)

ADD REPLY

Login before adding your answer.

Traffic: 1668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6