I have a dataset with the correlations between genes and OTUs. I want to plot these correlations with the
igraph library in R in order to know what genes are correlated with which OTU. Then, I will extract the different components (each component should represent a genome).
My dataset is very huge : I can't keep all the correlations (in the range [-1,1]), which gives a huge dataset (817.000*817.000 correlations). So, I want to select a threshold : is there a good way to set a good threshold? I mean, if I only keep the correlations > 0.9 , is it meaningful? I keep more than 58 million correlations if I do that. That creates 9152 components.
Another point is to know if I should only keep the correlations OTU-gene? Is it still meaningul to keep the correlations OTU-OTU and gene-gene? If I only keep the correlations OTU-gene > 0.8 , I keep more than 1,1 million correlations. That creates only 89 components.