I have recently started to work on RNASeq analysis. I need to do the following two aspects of analysis first, before performing the TopHat pipeline for RNASeq. I have performed demultiplexing step and also generated the fastq files using basecalls from HiSeq.
Can you guys explain me why these analyses are important to do first hand and how to proceed further?
A. the sequencing reads technical analysis: I have to perform a genome wide alignment using the RNA_seq data sets of lane 1 to lane 6, and I have to output the information on the sequencing reads technical analysis like:
1. The reads duplication analysis;
2. The contamination analysis of the Illumina adaptor sequences;
3. The GC content analysis.
B. the biological quality analysis: using the mapping results above, also I need to output the biological quality analysis of the data sets like:
1. The percentage of the sequencing reads derived from the rRNA genes;
2. The percentage of the sequencing reads derived from the globin gene;
3. Because this is a strand specific RNA-seq, I have to include the sense and antisense information for the corresponding genes.