How to simulate RNA-seq raw read counts matrix
1
0
Entering edit mode
6.4 years ago
statfa ▴ 760

Hi,

I asked a question regarding how to simulate RNA-seq data online because I can't download big fasta files to run Polyester package on my laptop. But I haven't received a response yet. Now I ask my question differently. Since I can't download the fasta files, if I obtain raw read counts matrix using Galaxy, Can I then simulate this matrix on my laptop? Is it possible and correct to simulate data from raw read counts matrix? I need to simulate some time course read counts to compare some statistical methods for detection of DE genes.

Thanks

RNA-Seq simulate read counts • 4.3k views
ADD COMMENT
0
Entering edit mode

Closed as duplicate of Where to simulate RNA-seq data online

Please follow the discussion there.

ADD REPLY
0
Entering edit mode

@Kevin I think this question is actually different and should be re-opened. Here @natalia is asking if a count matrix can be simulated (independent of reads/alignments). This would be handy if @natalia is not able to find an online source to be able to simulate the reads.

ADD REPLY
0
Entering edit mode

Okay, left it open and will monitor the responses.

ADD REPLY
0
Entering edit mode

Might make more sense to close the other thread, this question makes more sense imho.

ADD REPLY
0
Entering edit mode
6.4 years ago
h.mon 35k

I found a couple of alternatives, maybe one of them will fit your needs: SimSeq, or maybe you could modify DESeq::makeExampleCountDataSet to suit your needs.

P.S.: also compcodeR::generateSyntheticData

ADD COMMENT
0
Entering edit mode

Thank you very much. I will take a look at them. Do they simulate data using a count matrix or do they need Fasta and GTF annotation files? I think that DESeq isn't appropriate for simulation, is it? I mean that I think DESeq doesn't use any real data to generate some new dataset. Wouldn't it be more sensible if I simulate data using the parameters obtained from a dataset? Sorry, I'm very new to simulation.

ADD REPLY
0
Entering edit mode

I think you want to look at SimSeq first. From its manual:

Description

RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method. We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. Users control the proportion of genes simulated to be differentially expressed (DE) and can provide a vector of weights to control the distribution of effect sizes. The algorithm requires a matrix of RNA-seq read counts with large sample sizes in at least two treatment groups. Many datasets are available that fit this standard.

ADD REPLY
0
Entering edit mode

Wow, if that works the way they say, that would be great. Let me check it out. I'll get back to you.

ADD REPLY
0
Entering edit mode

I checked it and it seems to be working. I want to simulate a time course study and they don't accommodate the dependence over time. Will that make problems in my study?

Example 5: Simulate three treatment groups:

3 Different types of Differential Expression Allowed

First Group Diff, Second and Third group Equal

Second Group Diff, First and Third group Equal

Third Group Diff, First and Second group Equal

As you see, they look at each group independent of the other groups which is meaningless in a time course study. And they generate genes which are differentially expressed in only one treatment group. Maybe having DE genes in at least one treatment is more sensible. For example having a path of Down-Up-EE-Down. Do you have any comment and suggestion please? From the manual of Polyester, I know that they simulate time course studies. Sorry to ask many questions. Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6