I need to work with SOLiD small RNA-seq libraries from SRA, and I have some questions about quality control and the to-do and not-to-do with this data.
1- Which is the best approach for quality checking my reads? I'm used to fastqc for quality checking but I'm not sure if it can handle colourspace reads (or handle it properly).
2- How I should filter/trim my reads? I may need to trim adaptors, but don't want to quality-trim my reads, as they represent entire RNAs (sRNAs) and trimming it could lead to artifacts. I would rather filter out reads with low quality bases. Which is a good tool for trimming adaptors/quality filtering the libraries and what range of quality values are considered acceptable (we usually apply a Q30 filter for illumina libraries)?
3 - As far as I'm concerned, SOLiD sequences a small portion of the 5' adapter, besides the possible sequencing of the 3' adapter. Should I take this into account during adaptor trimming or is it usually already removed?
4- This set of data has both data from SOLiD and Illumina (different libraries, of course), but for the sake of standardization we would like to use STAR for aligning all libraries. I don't know if it can handle colourspace reads, so I would like to know if it is ok to turn my colourspace reads into fastq, and at which point (quality checking, filtering our before alignment) could I do this. If its not advisable to do so, which tools are the best for SOLiD reads alignment.
I know I've asked a lot of questions, but I never did anything with this kind of data. If someone could at least point some directions, some papers that could help with one of this questions, it would be great.