Hi everyone, I am a newbee here.
I am starting to learn some bioinformatic analysis and decided to run some RNA-seq analysis.
I am using the protocol from this paper as a reference: RNA-seq: Basic Bioinformatics Analysis and trying to analysis RNA-seq data from this paper Therapeutic targeting of MLL degradation pathways in MLL-rearranged leukemia, I just chose the RNA-seq data of cells treated with or without and inhibitor.
For the protocol, there are some small mistakes in the coded provided, and I am sure I made the right adjustments (to the picard qc step).
In short, I used STAR as the aligner, settings were same in the paper. Then I used picard tools to run the quality control aftering mapping, but the results seemed kind of weird (showing below). I wonder what would be the normal range of these metrics (especially those not described in the paper; the paper stressed on the importance of Mapped read percentage, Mapped read percentage. Duplication rate)? Does my results make sense? What would improve them?
Below are my qc results (combines metrics from RNASeqMetrics and MarkDuplicates):
or the link: https://ibb.co/ZL8g38q
If the results were really bad, should I start again from fastqc, or the mapping step, or switch to a new QC tool, or just simply use tools to filter the results of aligning (right now I can only think of samtools, any other options?) ?
Another question is the protocol didn't include a fastqc, so I run it for the data, and the results were all like this: or the link: https://ibb.co/2NJpLKT Should I use tools like trimmomatic to filter the data?
Would really appreciate it if you guys can give me some instructions, many thanks!
ps. the code for running STAR:
STAR --genomeDir $star_ref --readFilesIn $fastq_dir/$file --runThreadN 64 --sjdbGTFfile $gtf --genomeLoad NoSharedMemory --outReadsUnmapped Fastx