Question: kmers in rna-seq
4
gravatar for sam
4.8 years ago by
sam130
United States
sam130 wrote:

I have a set of kmer counts coming from 2 groups. The first and second group have 25 RNA-seq samples each. I'm interested in identifying kmers that appear to have counts that are different between the 2 groups. In other words, for example, i have the 3mer AAT counts for each sample in both groups. I want to test whether the number of occurrence of this 3mer is significantly different between the 2 groups. Note here that I normalize my data to account for different library sizes in the different samples. Would it be correct to address this problem as trying to test whether the two distribution are significantly different (e.g., test whether the distribution of the 3mer AAT in the first group is significantly different than the distribution of the 3mer AAT in the second group)? In that case I could use a statistical test such as Kolmogorovā€“Smirnov test or is there a better approach to tackle this problem?

thanks

rna-seq kmer • 1.8k views
ADD COMMENTlink modified 4.7 years ago by Biostar ♦♦ 20 • written 4.8 years ago by sam130

Are you expecting a different answer than when you posed a similar question (k-mer analysis in RNA-seq) yesterday?

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

yes because I don't think we could use DESEQ for this problem given the fact that we are not trying to detect deferentially expressed genes here...

ADD REPLYlink written 4.8 years ago by sam130
1

In essence it is the same, though. Doesn't matter what your names areĀ (Gene names or K-mer names). You should go with one of the promimnent tools since you most likely get a distribution which can be modelled by NB and thus using DESeq2, edgeR etc... is the best choice...

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Phil S.660
1

The question boils down to asking whether counts, that are likely well described by a negative binomial distribution, are changed by a treatment. DESeq2/edgeR/etc. are just implementations of such a GLM-based testing procedure, so they can still be used.

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

:) almost simultaneously

ADD REPLYlink written 4.8 years ago by Phil S.660
1

I guess the internet latency to Bonn is a bit longer than to Stuttgart :P

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

Depends on a test, but it may be a good idea to get rid of infrequent kmers -- kmers with frequency 1 may account for a large portion of your kmer set and are a product of seq. errors (as opposed to true biological signal).

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Lynxoid220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 523 users visited in the last hour