I am a beginner in this field of Transcriptomics and I really need your help with RSEM for differential gene expression. I have sequenced my samples by using SOLiD 5500 platform. I used SATRAP denovo assembler for the assembly of color-space FASTA reads and generated contigs in FASTA format. My questions regarding the RSEM is: The contigs are from single-end / fragment library and are of variable lengths, I do not know which numbers should I assign to the important fragment length distribution parameters (--fragment-length-mean and --fragment-length-sd)? The CD-HIT-EST stats file for redundant contigs has the following sequence length distributions: Sequence type DNA No. sequences 12876 Longest sequence 932 Shortest sequence 100 Average length 334 Total letters 4305560 Total N letters 6844 Total non N 4298716 Sequences with N 583
I will be greatly obliged for your advice and humble support. Thank you.
Solid is an obsolete platform, and for good reason - it was ill-conceived, poorly-executed, and generated inferior data. In my opinion, it is a waste of time to analyze Solid data. Alternatives are so vastly superior that even when you already have Solid data, it is cheaper to buy alternative data (Illumina, Ion Torrent, PacBio, etc) and generate quality output than spend much more time massaging Solid data to produce inferior output, once you factor in the time wasted chasing down false paths.
I'm afraid not everyone has the luxury of choosing which data/platform to use...