k-mer analysis in RNA-seq
2
1
Entering edit mode
9.7 years ago
sam ▴ 130

hello

I have an RNA-seq time series experiment (4 time points) where at t=0 the sample gets infected with a virus. I'm interested in checking whether their is any correlation between the kmers and the viral load increase across the 4-time points. I compute the 2-mers up to 7-mers for each fastq file (each RNA-seq sample). Is there any way to go about doing this analysis (trying to see whether there exist a relationship between kmers and viral load)? any help would be greatly appreciated..

sequence RNA-Seq • 3.1k views
ADD COMMENT
1
Entering edit mode
9.7 years ago

I'm assuming you have a file with a kmer ID (or sequence) in one column and a count in another (either with multiple samples in one file or in separate files). If that's correct, just loading that into R and doing a simple glm.nb(count~time) (you'll need to library(MASS) first) would be a good place to start. You'd need to apply that to each row of the matrix, of course. You could also just try DESeq2/edgeR/etc. with that data, since it ends up being count data anyway.

ADD COMMENT
1
Entering edit mode

One thing to be careful is that statistical tools rely on the count data having certain statistical properties that account for random chance and even sampling.

When counting other quantities such as kmers it is not clear that the space is evenly covered, or that considering the tiny viral genomes that is even remotely well covered. Even though the outcome may be counts the way these counts were produced would matter.

I don't know the answer above and I am just thinking out loud.

ADD REPLY
1
Entering edit mode
9.7 years ago

If your virus and host of interest are well annotated you could align the reads on it by creating a hybrid genome (host+virus). After that you can count the number of reads for each viral transcript for each of the four time points. Then, as Devon proposed, use DESeq or edgeR to normalize the read counts ( don't forget to put also the host transcript count in your input count-matrix ) for the viral transcript, and compute a correlation.

ADD COMMENT

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6