I have DNA sequencing data enriched for a region of DNA, after mapping and doing further processing downstream analysis (including de-duplication of reads) I end up with a subset of informative reads.
I have performed sampling (from 10% of the input sequenced reads to 100%), after each sampling I perform the full analysis pipeline and obtain the number of informative reads.
I am trying to find out what is the saturation point after which, no more additional input reads will yield informative reads in the downstream analysis.
I end up with a table such as the following
sample % 10 20 30 ... 100 Informative reads 1234 1421 1700 ... 3000
I have seen this paper where they are trying to do something similar:
However, I don't think I can use this method because of how the data is processed. In the case of the paper, the data is directly assigned to one of multiple groups (for example, bins of a certain genome length), the downstream analysis I am performing just differentiates between "informative" and "non-informative" reads which no subgrouping within the informative reads and more complex processing.
When I plot the table described above, I end up with a similar curve as Michaelis-Menten described by a mathematical function :
y = V x / ( K + x )
Where V and K are parameters.
I have been trying to fit curves to the data to extrapolate the saturation point in Python but I can't seem to get it to work:
import numpy as np import pandas as pd import lmfit as lm import matplotlib.pyplot as plt if __name__ == "__main__": y = # Array of data with corresponding informative reads x = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] # Define an objective function to minimise def obj(params, x, y) : model = params["V"].value * x / ( params["K"].value + x) return model - y # Create parameters object params = lm.Parameters() params.add("V", value = 20, min = 15, max = 25) # looks about right from plt.plot(x, y) params.add("K", value = 40, min = 35, max = 55) # based on our guesses for V result = lm.minimize(obj, params, args = (x, y) ) lm.report_fit(params) plt.plot(x, y) plt.plot(x, result.residual + y)
I am getting an error:
model = params["V"].value * x / ( params["K"].value + x) ValueError: operands could not be broadcast together with shapes (200,) (10,)