What is the Trinity setting max_pct_stdev?
1
0
Entering edit mode
9.0 years ago
Ekarl2 ▴ 120

In the Trinity manual, I read the following:

--max_pct_stdev <int> :maximum pct of mean for stdev of kmer coverage across read (default: 200)

What does this mean in more detail? I found an older discussion:

http://ivory.idyll.org/blog/trinity-in-silico-normalize.html

where they explain it as:

Here, the per-read pct_dev is defined as the deviation in k-mer coverage divided by the average k-mer coverage, times 100 (to make it a percent). If the deviation is high, that indicates that the read is likely to contain many errors, since high-coverage reads with low-coverage k-mers shouldn't happen. Trinity sets a cutoff of 100: if the deviation is as big as the average, the read should go away

So is max_pct_stdev just std kmer coverage / avg kmer coverage * 100?

Does this mean that a high value for this statistic mean that there are some kmers with really low kmer coverage (thus increasing the std by a lot) compared with the average and assuming a generally large sequence coverage, these reads are probably bad? Or were do we get the "high-coverage read" from?

trinity RNA-Seq max_pct_stdev • 2.1k views
ADD COMMENT
2
Entering edit mode
9.0 years ago
Ram 43k

From what I understand:

For a 100b read with coverage=30, ideally, all 25-mers from it should ideally be covered around 30X. (this is just an example with random values)

Though I am not entirely sure why that assumption is made, it seems to be a rare case where one encounters k-mers with orders of magnitude higher coverage than the read they are from. However, it is entirely possible, owing to sequencing errors, that a kmer has really low coverage compared to the read. If a read has multiple such erroneous k-mers, distributed across the read, it would increase the STD DEV in the set of kmer coverage values but may not filter out the read itself at QC. Such a read can be considered suboptimal and discarded without significant loss to the assembly process.

ADD COMMENT
0
Entering edit mode

Thank you for your detailed explanation. Have I understood the equation ("std kmer coverage / avg kmer coverage * 100") correctly?

ADD REPLY
1
Entering edit mode

That seems right. Think of it as "what percent of the mean k-mer coverage can the k-mer coverage sd be, at max?"

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6