Question: What is the Trinity setting max_pct_stdev?
0
gravatar for Ekarl2
5.8 years ago by
Ekarl2120
Ekarl2120 wrote:

In the Trinity manual, I read the following:

--max_pct_stdev <int> :maximum pct of mean for stdev of kmer coverage across read (default: 200)

What does this mean in more detail? I found an older discussion:

http://ivory.idyll.org/blog/trinity-in-silico-normalize.html

where they explain it as:

Here, the per-read pct_dev is defined as the deviation in k-mer coverage divided by the average k-mer coverage, times 100 (to make it a percent). If the deviation is high, that indicates that the read is likely to contain many errors, since high-coverage reads with low-coverage k-mers shouldn't happen. Trinity sets a cutoff of 100: if the deviation is as big as the average, the read should go away

So is max_pct_stdev just std kmer coverage / avg kmer coverage * 100?

Does this mean that a high value for this statistic mean that there are some kmers with really low kmer coverage (thus increasing the std by a lot) compared with the average and assuming a generally large sequence coverage, these reads are probably bad? Or were do we get the "high-coverage read" from?

max_pct_stdev rna-seq trinity • 1.4k views
ADD COMMENTlink modified 5.8 years ago by _r_am32k • written 5.8 years ago by Ekarl2120
2
gravatar for _r_am
5.8 years ago by
_r_am32k
Baylor College of Medicine, Houston, TX
_r_am32k wrote:

From what I understand:

For a 100b read with coverage=30, ideally, all 25-mers from it should ideally be covered around 30X. (this is just an example with random values)

Though I am not entirely sure why that assumption is made, it seems to be a rare case where one encounters k-mers with orders of magnitude higher coverage than the read they are from. However, it is entirely possible, owing to sequencing errors, that a kmer has really low coverage compared to the read. If a read has multiple such erroneous k-mers, distributed across the read, it would increase the STD DEV in the set of kmer coverage values but may not filter out the read itself at QC. Such a read can be considered suboptimal and discarded without significant loss to the assembly process.

ADD COMMENTlink written 5.8 years ago by _r_am32k

Thank you for your detailed explanation. Have I understood the equation ("std kmer coverage / avg kmer coverage * 100") correctly?

ADD REPLYlink written 5.8 years ago by Ekarl2120
1

That seems right. Think of it as "what percent of the mean k-mer coverage can the k-mer coverage sd be, at max?" 

ADD REPLYlink written 5.8 years ago by _r_am32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2533 users visited in the last hour
_