SimSeq package can take a matrix as input to simulate rna-seq data which is what I'm looking for. But the problem is it looks at the treatments as independent conditions so in time course studies it doesn't take into account the dependence over time. So I can't use it. Polyester package can get as input a count matrix and simulate time course studies which is my ideal. But my problem is I don't understand somethings about it and that is why I'm here to ask you some questions, please. And believe me, I have searched a lot to get the info but I still have problems understanding them. So please don't blame me for asking these simple question, or if you'd like to blame me, do it but please answer my questions too. Thanks a lot.
Please take a look at this context from Polyester's manual,
You'll need to provide transcript annotation from which reads should be simulated. There are several public data repositories where you can download this annotation. You can simulate reads from any organism for which annotation is available.
Annotation must be provided in one of two formats:
-FASTA: text file containing names and sequences of transcripts from which reads should be simulated. Known transcripts from human chromosome 22 (hg19 build) are available in extdata/chr22.fa.
-GTF format + FASTA sequence files. The GTF file should denote the transcript structures, and you'll need a FASTA file of the full DNA sequence for each chromosome in the GTF file. All the chromosome-specific FASTA files should be in the same directory. "
1- For the second format, is it correct if I download the GTF file and the FASTA file from here? Which one should I download? The first GTF and FASTA files in each section?
Now, take a look at this:
"If you're an experienced user requiring more flexibility, you can use the simulate_experiment_countmat function to directly specify the number of reads you'd like to simulate for each transcript and each replicate in the data set. This function takes a count matrix as an argument.
This function creates FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)"
2- Now if I give this function the GTF file, FASTA file and the count matrix from a real experiment, it provides me with some FASTA files. How can I use these FASTA files? Should I align them again to the reference genome using HISAT and then obtain read counts using htseq?
Do I understand everything correctly? Thank you.