Question: Subset Function in Ballgown
0
gravatar for msobol
4 months ago by
msobol30
msobol30 wrote:

Hello,

I am using the HISAT, Stringtie, and Ballgown pipeline to do transcriptome expression analysis. So far I went through these steps.

bg=ballgown(dataDir=data_directory, samplePattern='1and2', meas='all')

bg_ifungis = ballgown(dataDir = data_directory, samplePattern = '1and2', pData=pheno_data)

bg_ifungis_filt = subset(bg_ifungis,"rowVars(texpr(bg_ifungis)) >1",genomesubset=TRUE)

results_transcripts = stattest(bg_ifungis_filt, feature="transcript",covariate="timepoint",adjustvars = c("location"), getFC=TRUE, meas="FPKM")

So here, I set adjustvars to the location column in my pData, which is either Earth or ISS. However, here I have compared all of Earth samples to all of ISS, and now I would like to compare each separately, on their own.

I first tried changing the results_transcripts code to this

results_transcripts_E = stattest(bg_ifungis_filt, feature="transcript",covariate="timepoint",adjustvars = c("location=Earth"), getFC=TRUE, meas="FPKM")

Did not work, said Earth is not a valid covariate. Then I tried to subset the data using ballgowns subset command

bg_ifungis_Earth = subset(pheno_data,"pheno_data$location == Earth",genomesubset=FALSE)

> Error in subset.data.frame(pheno_data, "pheno_data$location == Earth",  : 
  'subset' must be logical

I tried variations of the above, but kept getting a similar error. Is there any way I can subset my data by location in Ballgown? Or am I going to have to re-do the Stringtie assemblies and everything so that Earth and ISS are treated separately?

I hope that makes sense!

Thanks in advance, Morgan

ADD COMMENTlink modified 4 months ago by Amar640 • written 4 months ago by msobol30
0
gravatar for Amar
4 months ago by
Amar640
Amar640 wrote:

My apologies I haven't used ballgown so I'm mostly reading and guessing.

I skimmed the manual/tutorial quickly and noticed that you need to specify timecourse = TRUE when performing time series. The other thing I noticed is that the covariate essentially tells ballgown the grouping to test (control/case). So you're telling ballgown covariate="timepoint" so treat the time points as groups and to adjust for location via adjustvars=c("location").

From what I understand adjustvars is for handling cofounding factors NOT for defining the groupings to test (that's why you get an error when specifying c("location=Earth"), it's looking for a variable called "location=Earth" which doesn't exist). So in your first analysis I think you aren't comparing earth vs ISS, you're actually performing a time-series experiment over all your data and telling ballgown that the location is a confounding factor. I think because you haven't set timecourse = TRUE it's treating the samples as multigroup comparison and not time-series analysis like youre expecting. Now I'm not sure if this interpretation is correct because again I haven't used ballgown. So double check this. The tutorial is very good and describes how to perform time-series analysis.

For subsetting I think your command should be: subset(pheno_data,"location == Earth",genomesubset=FALSE). I think you need to repeat your analysis with the subsetted datasets (one for earth and one for ISS), then run ballgown with covariate="timepoint" and also specify timecourse=TRUE (check the manual for this). Leave the adjustvars unless you there's a cofounding factor to adjust for. So you'll have two time-series DE datasets that you will have to compare.

Then I would recommend performing another analysis comparing directly all Earth vs all ISS data. Without seeing your data structure, I think you need to define a new variable called group with values identifying which datasets are Earth/ISS. Then run this 2-group DE analysis to see if all Earth differ from all ISS transcripts.

Now it's possible you want to perform a time-series experiment of all earth and ISS datasets and to adjust for the location. But to be honest I'm not sure what you'd want to answer with this experimental setup. Something I should have mentioned right at the start is that I don't know your aims/hypothesis so I'm totally guessing here at what your objective is.

P.S check getFC=TRUE option in the manual, I don't think it's available for time-series analysis.

ADD COMMENTlink written 4 months ago by Amar640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1004 users visited in the last hour