Matching Input Df To Multi-Result Output?
0
0
Entering edit mode
10.1 years ago
hanlonmt • 0

I'm working on some between-species post-GWAS comparisons and have reached a bottleneck in my processing. Any input would be great.

A truncated version of my output looks like this:

SUB_POP        SNP.NAME    CHROMOSOME    POSITION    pVALUE    region
A             SNP-5.3784657.    5    16288402    7.20E-07  5:3734657:3834657
A             SNP-11.26079639.    11    26551225    1.17E-06  11:266501225:266601225
A             SNP-3.30115818.    3    30122780    1.66E-06  3:30072780:30172780

dput

structure(list(SUB_POP = structure(c(1L, 1L, 1L), .Label = "A", class = "factor"), 
SNP.NAME = structure(c(3L, 1L, 2L), .Label = c("SNP-11.26079639.", 
"SNP-3.30115818.", "SNP-5.3784657."), class = "factor"), 
CHROMOSOME = c(5L, 11L, 3L), POSITION = c(16300000, 26551225, 
30122780), pVALUE = c(7.2e-07, 1.17e-06, 1.66e-06), region = structure(c(3L, 
1L, 2L), .Label = c("11:266501225:266601225", "3:30072780:30172780", 
"5:3734657:3834657"), class = "factor")), .Names = c("SUB_POP", 
"SNP.NAME", "CHROMOSOME", "POSITION", "pVALUE", "region"), class = "data.frame", row.names = c(NA, 
-3L))

The 'range' column defines a region on the chromosmoe that contains genes in LD with the SNP, thus making all viable candidates. This portion of the project was completed in rice, which has quite large LD blocks.

I can query biomaRt and get these genes, but then have to match the output back to this original file for further processing. I have written this into a ddply loop but that is both painfully slow and not how (I think) biomart resources should really be used. Any idea of how I can maintain this input code and get some sort of intelligible output that I can use for processing down the line. BiomaRt and such already loaded.

rice_mart <- useMart("plants_mart_21", dataset="osativa_eg_gene")
library(plyr)
gr<-function(x){getBM(attributes=c("ensembl_gene_id","chromosome_name","start_position","end_position","strand","description"), filters=c("chromosomal_region","with_embl","biotype"),values=list(x$region,TRUE,"protein_coding"),mart=rice_mart)}
genes_rice <- ddply (sample,.(SUB_POP,SNP.NAME,POSITION,pVALUE), .progress="text", gr)

This gives me exactly what I want, which is the input df and the biomart results merged together. If I were querying by gene ID or something else, I could easily use a join function, but the region complicates this. Does anyone have experience or ideas on how to approach this?

biomart r gwas • 2.1k views
ADD COMMENT
0
Entering edit mode

What are you trying to do overall? ie, what are you starting with and what would you like to find out about it?

ADD REPLY

Login before adding your answer.

Traffic: 3741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6