Hello everyone, I am struggling with some bulk RNAseq data. I've been given the count matrix and metadata of bulk RNAseq which was run onto sorted T cells. Now the issue is that we unfortunately did not find what we would expect. I know that this happens most of the time in research, however I am looking if something went wrong during the sequencing or even before. Who run the analysis shared an excel file of their results report and I've noticed that for 10 out of 109 samples they had RIN values < 6 (3 of which had RIN = 1 and 1 had a RIN = 2.5). Nonetheless they procedeed with the sequencing for all of our samples. Do you think I should remove those samples? I've run a PCA to see if the majority the variance is associated with RIN score, but it does not seem so. Additionally I've looked at the multiQC report and I'd like to know if the plot of sequence duplication levels should be a concern or not.
Studying a bit about multiQC and FASTAQC report it's a bad output, but how bad?
Thanks
Francesco
Have you looked at this post from authors of FastQC?
If your data does not speak for itself then no amount of finagling is going to give you an expected result. You could look through the steps and see if you can find something obvious (e.g. alignment % not uniform across samples, varying levels of adapter contamination, order of magnitude of differences in read data etc).
Thank you for the post you shared, the issue of duplication is very well explained. Thanks also for your suggestions, the other QC parameters seem good, my biggest concern are the RIN values.
Anyway thank you for all the help!