Question: Matching Input Df To Multi-Result Output?
0
gravatar for hanlonmt
5.1 years ago by
hanlonmt0
Sweden
hanlonmt0 wrote:

I'm working on some between-species post-GWAS comparisons and have reached a bottleneck in my processing. Any input would be great.

A truncated version of my output looks like this:

SUB_POP        SNP.NAME    CHROMOSOME    POSITION    pVALUE    region
A             SNP-5.3784657.    5    16288402    7.20E-07  5:3734657:3834657
A             SNP-11.26079639.    11    26551225    1.17E-06  11:266501225:266601225
A             SNP-3.30115818.    3    30122780    1.66E-06  3:30072780:30172780

dput

structure(list(SUB_POP = structure(c(1L, 1L, 1L), .Label = "A", class = "factor"), 
SNP.NAME = structure(c(3L, 1L, 2L), .Label = c("SNP-11.26079639.", 
"SNP-3.30115818.", "SNP-5.3784657."), class = "factor"), 
CHROMOSOME = c(5L, 11L, 3L), POSITION = c(16300000, 26551225, 
30122780), pVALUE = c(7.2e-07, 1.17e-06, 1.66e-06), region = structure(c(3L, 
1L, 2L), .Label = c("11:266501225:266601225", "3:30072780:30172780", 
"5:3734657:3834657"), class = "factor")), .Names = c("SUB_POP", 
"SNP.NAME", "CHROMOSOME", "POSITION", "pVALUE", "region"), class = "data.frame", row.names = c(NA, 
-3L))

The 'range' column defines a region on the chromosmoe that contains genes in LD with the SNP, thus making all viable candidates. This portion of the project was completed in rice, which has quite large LD blocks.

I can query biomaRt and get these genes, but then have to match the output back to this original file for further processing. I have written this into a ddply loop but that is both painfully slow and not how (I think) biomart resources should really be used. Any idea of how I can maintain this input code and get some sort of intelligible output that I can use for processing down the line. BiomaRt and such already loaded.

rice_mart <- useMart("plants_mart_21", dataset="osativa_eg_gene")
library(plyr)
gr<-function(x){getBM(attributes=c("ensembl_gene_id","chromosome_name","start_position","end_position","strand","description"), filters=c("chromosomal_region","with_embl","biotype"),values=list(x$region,TRUE,"protein_coding"),mart=rice_mart)}
genes_rice <- ddply (sample,.(SUB_POP,SNP.NAME,POSITION,pVALUE), .progress="text", gr)

This gives me exactly what I want, which is the input df and the biomart results merged together. If I were querying by gene ID or something else, I could easily use a join function, but the region complicates this. Does anyone have experience or ideas on how to approach this?

R biomart gwas • 1.3k views
ADD COMMENTlink written 5.1 years ago by hanlonmt0

What are you trying to do overall? ie, what are you starting with and what would you like to find out about it?

ADD REPLYlink written 5.1 years ago by Emily_Ensembl18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour