Question: What is the Trinity setting max_pct_stdev?
gravatar for Ekarl2
4.9 years ago by
Ekarl290 wrote:

In the Trinity manual, I read the following:

--max_pct_stdev <int> :maximum pct of mean for stdev of kmer coverage across read (default: 200)

What does this mean in more detail? I found an older discussion:

where they explain it as:

Here, the per-read pct_dev is defined as the deviation in k-mer coverage divided by the average k-mer coverage, times 100 (to make it a percent). If the deviation is high, that indicates that the read is likely to contain many errors, since high-coverage reads with low-coverage k-mers shouldn't happen. Trinity sets a cutoff of 100: if the deviation is as big as the average, the read should go away

So is max_pct_stdev just std kmer coverage / avg kmer coverage * 100?

Does this mean that a high value for this statistic mean that there are some kmers with really low kmer coverage (thus increasing the std by a lot) compared with the average and assuming a generally large sequence coverage, these reads are probably bad? Or were do we get the "high-coverage read" from?

max_pct_stdev rna-seq trinity • 1.2k views
ADD COMMENTlink modified 4.9 years ago by RamRS25k • written 4.9 years ago by Ekarl290
gravatar for RamRS
4.9 years ago by
Houston, TX
RamRS25k wrote:

From what I understand:

For a 100b read with coverage=30, ideally, all 25-mers from it should ideally be covered around 30X. (this is just an example with random values)

Though I am not entirely sure why that assumption is made, it seems to be a rare case where one encounters k-mers with orders of magnitude higher coverage than the read they are from. However, it is entirely possible, owing to sequencing errors, that a kmer has really low coverage compared to the read. If a read has multiple such erroneous k-mers, distributed across the read, it would increase the STD DEV in the set of kmer coverage values but may not filter out the read itself at QC. Such a read can be considered suboptimal and discarded without significant loss to the assembly process.

ADD COMMENTlink written 4.9 years ago by RamRS25k

Thank you for your detailed explanation. Have I understood the equation ("std kmer coverage / avg kmer coverage * 100") correctly?

ADD REPLYlink written 4.9 years ago by Ekarl290

That seems right. Think of it as "what percent of the mean k-mer coverage can the k-mer coverage sd be, at max?" 

ADD REPLYlink written 4.9 years ago by RamRS25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1177 users visited in the last hour