Question

Unable to manually create smlSet using manually created snpmatrix

0

Entering edit mode

8.0 years ago

brianscomer • 0

Hi I am attempting to create an smlSet for use in GGtools for eQTL analysis using a snpmatrix that was manually created using the following code:

library(snpStats)
snpmatrix<-new("SnpMatrix", snpmatrix)

The original snpmatrix is simply a matrix object.

I call the SnpMatrix object, I get the following output:

snpmatrix
A SnpMatrix with 109 rows and 2443179 columns
Row names: XXXX_001 ... XXXX_103
Col names: GA008510 ...VGXS35706

The code generated an object of the class SnpMatrix with dimensions of 109 x 2443179. There are 109 samples and 2443179 snps in the object.

When I attempt to make a smlSet using the following code, I get an error as follows: "Error in make_smlSet(esetforeqtl, snpmatrix, organism="Homo sapiens";: sml must belist of SnpMatrix instances from 'snpstats' package"

The code that generated the error is listed below:

library(GGtools)
eqtl_smlset<-make_smlSet(esetforeqtl, snpmatrix, organism="Homo sapiens", harmonizeSamples=FALSE)

In addition to the snpmatrix, I also manually created the supporting objects for snpmatrix (snp.support and subject.support) but I am not sure how to combine all tree into one object. If the instructions are in the documentation I must have missed them in it.

The snpmatrix object and supporting snpmatrix objects were created using the code block at the end of the post (minus the lines that intially scanned the file line by line due to import issues) in case anyone wants to browse through the code.

Does anyone have any insight on how I can properly create a snpmatrix object coupled with the support objects for incorporation into an smlSet along with my expression data (ExpressionSet) using make_smlSet()?

    #SNP data preprocessing and coversion to appropriate formats fot GGtools


test<-rbind(test_line2, test_line3, test_line4, test_line5, test_line6, test_line7, test_line8, test_line9, test_line10, test_line11, test_line12, test_line13, test_line14, test_line15, test_line16, test_line17, test_line18, test_line19, test_line20, test_line21, test_line22, test_line23, test_line24, test_line25, test_line26, test_line27, test_line28, test_line29, test_line30, test_line31, test_line32, test_line33, test_line34, test_line35, test_line36, test_line37, test_line38, test_line39, test_line40, test_line41, test_line42, test_line43, test_line44, test_line45, test_line46, test_line47, test_line48, test_line49, test_line50, test_line51, test_line52, test_line53, test_line54, test_line55, test_line56, test_line57, test_line58, test_line59, test_line60, test_line61, test_line62, test_line63, test_line64, test_line65, test_line66, test_line67, test_line68, test_line69, test_line70, test_line71, test_line72, test_line73, test_line74, test_line75, test_line76, test_line77, test_line78, test_line79, test_line80, test_line81)
test<-test[,-22]
test<-rbind(test, test_line82, test_line83, test_line84, test_line85, test_line86, test_line87, test_line88, test_line89, test_line90, test_line91, test_line92, test_line93, test_line94, test_line95, test_line96, test_line97, test_line98, test_line99, test_line100, test_line101, test_line102, test_line103, test_line104, test_line105, test_line106, test_line107, test_line108, test_line109, test_line110)

indexNC<-test=="NC"
indexAA<-test=="AA"
indexAB<-test=="AB"
indexBB<-test=="BB"

test[indexNC]<-"00"
test[indexAA]<-"01"
test[indexAB]<-"02"
test[indexBB]<-"03"

#Remove extra files post rbind
rm(test_line2, test_line3, test_line4, test_line5, test_line6, test_line7, test_line8, test_line9, test_line10, test_line11, test_line12, test_line13, test_line14, test_line15, test_line16, test_line17, test_line18, test_line19, test_line20, test_line21, test_line22, test_line23, test_line24, test_line25, test_line26, test_line27, test_line28, test_line29, test_line30, test_line31, test_line32, test_line33, test_line34, test_line35, test_line36, test_line37, test_line38, test_line39, test_line40, test_line41, test_line42, test_line43, test_line44, test_line45, test_line46, test_line47, test_line48, test_line49, test_line50, test_line51, test_line52, test_line53, test_line54, test_line55, test_line56, test_line57, test_line58, test_line59, test_line60, test_line61, test_line62, test_line63, test_line64, test_line65, test_line66, test_line67, test_line68, test_line69, test_line70, test_line71, test_line72, test_line73, test_line74, test_line75, test_line76, test_line77, test_line78, test_line79, test_line80, test_line81)
rm(test_line82, test_line83, test_line84, test_line85, test_line86, test_line87, test_line88, test_line89, test_line90, test_line91, test_line92, test_line93, test_line94, test_line95, test_line96, test_line97, test_line98, test_line99, test_line100, test_line101, test_line102, test_line103, test_line104, test_line105, test_line106, test_line107, test_line108, test_line109, test_line110)
rm(indexAA, indexAB, indexBB, indexNC)

information<-test[,-1:-]

samplename<-test[,1]
snpnames<-test_line1[,89:2443267]
gender<-test[,2]
genderest<-test[,7]

snpmatrix<-test[1:109,33:2443211]

rownames(snpmatrix) <- samplename
colnames(snpmatrix) <- snpnames

extradata<-test[1:109, 1:32]

    #Illumino info from annotation file
annotation <- read.table(pipe("cut -f1,3,4,18 /N/u/bscomer/Karst/XXXX_82_thru_110_omni2_5_Custom_human_omni_2_5.annotation_edit.txt"))

library(splitstackshape)

annotation<-cSplit(indt=annotationtest, splitCols="V4", sep="/", stripWhite=FALSE)

annotation$V4_1<-as.character(annotation$V4_1)
annotation$V4_2<-as.character(annotation$V4_2)

annotation$V4_1<-gsub("\\[", "", annotation$V4_1)
annotation$V4_2<-gsub("\\]", "", annotation$V4_2)

annotation<-annotation[-c(1)]
annotation_snpnames<-annotation$V1
annotation<-data.frame(annotation)

rownames(annotation)<-annotation[,1]
annotation[,1]<-NULL
colnames(annotation)<-c("chromosome", "position", "A1", "A2")
annotation$position<-as.integer(annotation$position)
annotation$A1<-as.factor(annotation$A1)
annotation$A2<-as.factor(annotation$A2)

snp.support<-annotation

dfsamplename<-data.frame(samplename)

#export samples for annotation for case or control in spreadsheet software
write.table(dfsamplename, file="dfsamplenames")

#reimport sample list with control or case annotated in table as dfsamplenames
dfsamplenames <- read.csv("/gpfs/home/b/s/bscomer/Karst/Desktop/dfsamplenames.csv")

subject.support<-dfsamplenames
#stopped here
subject.support<-data.frame(subject.support)
subject.support$X.samplename.<-gsub("'", "", subject.support$X.samplename.)
row.names(subject.support)<-subject.support$X.samplename.
subject.support$X.samplename.<-NULL
colnames(subject.support)<-c("cc", "cc_text", "sex", "race")
subject.support$cc_text<-NULL
subject.support$sex<-NULL
subject.support$race<-NULL

#remove extra files to keep r from crashing
rm(test)

library(snpStats)
snpmatrix<-new("snpMatrix", snpmatrix)

Session information is below: (Note working on computing cluster so I cannot update R version without submitting a request to system administrators for next monthly update)

R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
  [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] BiocInstaller_1.16.5       GGtools_5.2.0              data.table_1.9.6           GGBase_3.28.0             
[5] sva_3.12.0                 mgcv_1.8-12                nlme_3.1-126               EMA_1.4.4                 
[9] genefilter_1.48.1          BiocParallel_1.0.3         pd.hugene.1.0.st.v1_3.10.0 RSQLite_1.0.0             
[13] DBI_0.3.1                  oligo_1.30.0               Biostrings_2.34.1          XVector_0.6.0             
[17] snpStats_1.16.0            Matrix_1.2-4               survival_2.38-3            limma_3.22.7              
[21] annotate_1.44.0            XML_3.98-1.3               AnnotationDbi_1.28.2       Biobase_2.26.0            
[25] rtracklayer_1.26.3         GenomicRanges_1.18.4       GenomeInfoDb_1.2.5         IRanges_2.0.1             
[29] S4Vectors_0.4.0            BiocGenerics_0.12.1        oligoClasses_1.28.0       

loaded via a namespace (and not attached):
  [1] acepack_1.3-3.3          affxparser_1.38.0        affy_1.44.0              affyio_1.34.0           
[5] assertthat_0.1           base64enc_0.1-3          BatchJobs_1.6            BBmisc_1.9              
[9] biglm_0.9-1              biomaRt_2.22.0           biovizBase_1.14.1        bit_1.1-12              
[13] bitops_1.0-6             brew_1.0-6               BSgenome_1.34.1          caTools_1.17.1          
[17] checkmate_1.7.1          chron_2.3-47             cluster_2.0.3            codetools_0.2-14        
[21] colorspace_1.2-6         dichromat_2.0-0          digest_0.6.9             dplyr_0.4.1             
[25] FactoMineR_1.32          fail_1.3                 ff_2.2-13                flashClust_1.01-2       
[29] foreach_1.4.3            foreign_0.8-66           Formula_1.2-1            gcrma_2.38.0            
[33] gdata_2.17.0             GenomicAlignments_1.2.2  GenomicFeatures_1.18.7   ggplot2_2.1.0           
[37] gplots_3.0.1             grid_3.1.1               gridExtra_2.2.1          GSA_1.03                
[41] gtable_0.2.0             gtools_3.5.0             Gviz_1.10.11             heatmap.plus_1.3        
[45] hexbin_1.27.1            Hmisc_3.17-3             iterators_1.0.8          KernSmooth_2.23-15      
[49] knitr_1.12.3             lattice_0.20-33          latticeExtra_0.6-28      leaps_2.9               
[53] magrittr_1.5             MASS_7.3-45              matrixStats_0.50.1       multtest_2.22.0         
[57] munsell_0.4.3            nnet_7.3-12              plyr_1.8.3               preprocessCore_1.28.0   
[61] RColorBrewer_1.1-2       Rcpp_0.12.4              RCurl_1.95-4.7           reshape2_1.4.1          
[65] ROCR_1.0-7               rpart_4.1-10             Rsamtools_1.18.3         scales_0.4.0            
[69] scatterplot3d_0.3-36     sendmailR_1.2-1          siggenes_1.40.0          splines_3.1.1           
[73] stringi_1.0-1            stringr_1.0.0            tools_3.1.1              VariantAnnotation_1.12.9
[77] xtable_1.8-2             zlibbioc_1.12.0

smlSet snpStats snpMatrix GGtools snp • 2.2k views

ADD COMMENT • link updated 8.0 years ago by Vince Carey ▴ 80 • written 8.0 years ago by brianscomer • 0

score 2 · Accepted Answer · 2016-04-14

2

Entering edit mode

8.0 years ago

Vince Carey ▴ 80

Thanks for the detailed information. Note that the code

eqtl_smlset<-make_smlSet(esetforeqtl, snpmatrix, ...

does not supply the necessary element in the second position, which must be a list of SnpMatrix objects. If you used list(snpmatrix) you would make more progress.

I can't quite follow the questions about snp.support or subject.support. subject.support appears to be a structure for sample-level data, and that would be part of the phenoData component of the ExpressionSet. Feel free to contact me with more details on additional difficulties encountered.

ADD COMMENT • link 8.0 years ago by Vince Carey ▴ 80

0

Entering edit mode

Thanks: The following code corrected the issue.

eqtl_smlset<-make_smlSet(esetforeqtl, list(all=snpmatrix), organism="Homo sapiens", harmonizeSamples=TRUE)

It will take a few hours to run the eQTL analysis in R before I will have confirmation that the manually constructed smlset works downstream.

ADD REPLY • link 8.0 years ago by brianscomer • 0

0

Entering edit mode

I would also comment that the gQTLstats package in Bioconductor includes new facilities for eQTL analysis for genomic quantifications in SummarizedExperiment containers, and variant data in VCF files. cisAssoc and AllAssoc functions are available. This is intended to reduce complexity in handling variant data in R.

ADD REPLY • link 8.0 years ago by Vince Carey ▴ 80