Could someone provide a detailed guide on downstream preprocessing steps based on my FASTQC report? I conducted FASTQC analysis on a paired sample, and here are the results for one of the pairs, which showed similar outcomes. While I am familiar with interpreting FASTQC reports, I am unsure if additional steps are necessary before running STAR. I plan to use Cut Adapt to remove adaptors since they were detected and showed a yellow warning in the report. However, if the adaptor content is indicated as green in the FASTQC report, would it still be necessary to trim it? Additionally, many of my samples exhibit high duplication levels. Should I use Picard to deduplicate before conducting a differential expression analysis? Lastly, how should the red indication for the Per Base Sequence Content be interpreted in the report, and is there a recommended course of action to address it?
Please read the following blogs from authors of FastQC:
https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/
https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/
That would be fine though you don't need to strictly do it if you are aligning with STAR. STAR will "soft-clip" parts of reads that do not align, which will include adapter sequences.
Note: If this is single cell data (based on included tag) then FastQC is going to be of limited use.
Thank you for your help! This is single cell data! I'm confused about why fastqc is of limited use?
Perhaps I should qualify my comment to say that if this was 10x single cell then using FastQC is of limited use. What kind of single cell data is this? If there are specific instructions on data processing included in the kit used then you should follow those.