Hi there, I have 16 assemblies of a diploid plant species for which I wish to do my own Nx plots. I think I got to a good point by extracting all contigs length to a vector and wrangling data into R
.
The problem is that I didn't really know what other info I needed to generate such plot, so I searched up online and Search Labs from Google presented me with this piece of code
library(Biostrings)
# Assuming your contig lengths are in a vector called 'contig_lengths'
contig_lengths <- c(1000, 800, 600, 400, 200, 100) # Example contig lengths
# Calculate cumulative length
cumulative_length <- cumsum(sort(contig_lengths, decreasing = TRUE))
# Calculate Nx values
total_length <- sum(contig_lengths)
nx_values <- sapply(seq(0, 100, by = 1), function(x) {
threshold <- (x / 100) * total_length
index <- which(cumulative_length >= threshold)[1]
if (is.na(index)) {
return(0)
}
return(sort(contig_lengths, decreasing = TRUE)[index])
})
# Create the plot
plot(seq(0, 100, by = 1), nx_values, type = "l",
xlab = "Percentage of Assembly (%)", ylab = "Contig Length",
main = "Nx Plot")
where I plug in my contigs_lengths and tested to generate a plot. However, it doesn't seem to be working as expected (see plot below)...
Now, I believe I can manage to tweak things in R
to attain the intended results, but at a first look this script appears to be doing what is by definition an Nx plot; hence, I cannot really see the problem.
To be noted this is my first time doing this analyses and it might not be clear to me what exactly I need to plot on both axes, that said any help is much appreciated. Thanks in advance!
What is an Nx plot and what is it you're trying to visualise?