Question

Working in R with HTS cluster analysis data

0

Entering edit mode

3.5 years ago

Kinoppy • 0

Good morning, I'm working with a large data set. Is a table with cluster analysis data from an HTS sequencing. I have a problem with constant crushing of R studio, especially when I try to make plots and merging 2 data frame. Sometimes I have the crushing error (session aborted) also when I try to load the data in the console and I am basically unable to proceed with my analyzes.

There is a way to work with such data set using tidyverse and vegan package without having crush in R studio?

I'm using the latest version of R v.4.0.2 and R studio v.1.3.1093 in Ubuntu v.18.04.5

Thanks for the help.

R HTS vegan tidyverse metabarcoding • 885 views

ADD COMMENT • link 3.5 years ago by Kinoppy • 0

0

Entering edit mode

Impossible to answer without further details. You are essentially telling that you have a problem but not what it is in terms of size of the data, code you are running etc.

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

Try your workflow on a toy sample of your data. For instance, if you could take a 20% of your data, would it work? Load the data, take a sub-sample, destroy the original object followed by gc() to clear memory. If it doesn't work, take a smaller sample. If it does work, examine how many resources were consumed (i.e. use top in your console to follow memory), and try taking a larger sample. See if there's something obvious happening, like maybe you're running out of memory with your full data set. Try to rule out the obvious things.

ADD REPLY • link 3.5 years ago by seidel 11k

0

Entering edit mode

This is an example database of how my data is arranged:

my_data <- tibble(tag = c("tag_1", "tag_2", "tag_3", "tag_4", "tag_5"), 
       sampler = c ("sampler_1", "sampler_2", "sampler_1", "sampler_3", "sampler_2"), 
       site = c("site_1", "site_2", "site_2", "site_2", "site_1"), 
       otu_1 = c(0, 25, 0, 124, 35), 
       otu_2 = c(195, 24, 0, 0, 0),
       otu_3 = c(35, 14, 0, 16, 0), 
       otu_4 = c(0, 0, 0, 123, 24))

In the original data that I have there are about 12000 column (tag, sampler, site, and all the otus) and 1000 rows. Is not a large dataset in terms of memory, but when i use the function pivot_longer() I obtain a data more than 2.5 Gb when i write it in a .csv file.

my_data_2 <- my_data %>%
pivot_longer(cols = contains("otu_"), names_to = "otus", values_to = "val")

I use this function to put all the otus in a column because I want to merge this data with another data set that contains taxonomic units corresponding to my otus. Once I have the data in this forms I have the problems. If I try to make a plot, R takes a long time to process the code and finally (after at least 5 minutes) I get a crush error.

This is an example for one of the plots that gives me error:

my_data_2 %>%
ggplot() +
geom_bar(mapping = aes(x = site, y = val, fill = sampler), stat = "identity", position = "fill")

Another crush that I get is trying to plot with rarecurve() function from vegan package.

For that I prepare the data in this way:

my_data_3 <- my_data %>%
select(tag, contains("otu_") %>%
column_to_rownames(var = "tag") %>%
as.matrix()

rarecurve(my_data_3)

I get the same error as for ggplot.

Thank you for the help.

ADD REPLY • link updated 3.5 years ago by GenoMax 141k • written 3.5 years ago by Kinoppy • 0

0

Entering edit mode

Use ADD REPLY/ADD COMMENT when responding to existing threads. This should be ideally added to the original post by editing it.

SUBMIT ANSWER is only for new questions to the original question.

ADD REPLY • link 3.5 years ago by GenoMax 141k