Working in R with HTS cluster analysis data
0
0
Entering edit mode
6 months ago

Good morning, I'm working with a large data set. Is a table with cluster analysis data from an HTS sequencing. I have a problem with constant crushing of R studio, especially when I try to make plots and merging 2 data frame. Sometimes I have the crushing error (session aborted) also when I try to load the data in the console and I am basically unable to proceed with my analyzes.

There is a way to work with such data set using tidyverse and vegan package without having crush in R studio?

I'm using the latest version of R v.4.0.2 and R studio v.1.3.1093 in Ubuntu v.18.04.5

Thanks for the help.

R HTS vegan tidyverse metabarcoding • 218 views
0
Entering edit mode

Impossible to answer without further details. You are essentially telling that you have a problem but not what it is in terms of size of the data, code you are running etc.

0
Entering edit mode

Try your workflow on a toy sample of your data. For instance, if you could take a 20% of your data, would it work? Load the data, take a sub-sample, destroy the original object followed by gc() to clear memory. If it doesn't work, take a smaller sample. If it does work, examine how many resources were consumed (i.e. use top in your console to follow memory), and try taking a larger sample. See if there's something obvious happening, like maybe you're running out of memory with your full data set. Try to rule out the obvious things.

0
Entering edit mode

This is an example database of how my data is arranged:

my_data <- tibble(tag = c("tag_1", "tag_2", "tag_3", "tag_4", "tag_5"),
sampler = c ("sampler_1", "sampler_2", "sampler_1", "sampler_3", "sampler_2"),
site = c("site_1", "site_2", "site_2", "site_2", "site_1"),
otu_1 = c(0, 25, 0, 124, 35),
otu_2 = c(195, 24, 0, 0, 0),
otu_3 = c(35, 14, 0, 16, 0),
otu_4 = c(0, 0, 0, 123, 24))


In the original data that I have there are about 12000 column (tag, sampler, site, and all the otus) and 1000 rows. Is not a large dataset in terms of memory, but when i use the function pivot_longer() I obtain a data more than 2.5 Gb when i write it in a .csv file.

my_data_2 <- my_data %>%
pivot_longer(cols = contains("otu_"), names_to = "otus", values_to = "val")


I use this function to put all the otus in a column because I want to merge this data with another data set that contains taxonomic units corresponding to my otus. Once I have the data in this forms I have the problems. If I try to make a plot, R takes a long time to process the code and finally (after at least 5 minutes) I get a crush error.

This is an example for one of the plots that gives me error:

my_data_2 %>%
ggplot() +
geom_bar(mapping = aes(x = site, y = val, fill = sampler), stat = "identity", position = "fill")


Another crush that I get is trying to plot with rarecurve() function from vegan package.

For that I prepare the data in this way:

my_data_3 <- my_data %>%
select(tag, contains("otu_") %>%
column_to_rownames(var = "tag") %>%
as.matrix()

rarecurve(my_data_3)


I get the same error as for ggplot.

Thank you for the help.

0
Entering edit mode

Use ADD REPLY/ADD COMMENT when responding to existing threads. This should be ideally added to the original post by editing it.

SUBMIT ANSWER is only for new questions to the original question.