Entering edit mode
9 months ago
Luna
•
0
Hello everyone,
I am analyzing RNA-Seq data and have encountered the following issues that I need clarification on:
- Per Base Sequence Content:
The per-base sequence content varies throughout the read, not just in the first 15 positions. Is this indicative of a problem with the sequencing or the data itself?
- Overrepresented Sequences:
There are multiple overrepresented sequences in the data. After performing BLASTn, I found that they match the genome of the organism I am studying. Is it common to have such overrepresented sequences that align with the organism's genome, or does this point to a potential bias in library preparation?
- Adapter Content:
Despite not running Trimmomatic yet, there appears to be no adapter content in the raw data. However, the paper from which I obtained the dataset mentions using the Illumina TruSeq RNA kit for library preparation, which typically adds identifiable adapters. How is it possible that no adapter sequences are detected in the data?
Any insights into these issues would be greatly appreciated!
Thank you!
The
pass/failmetric definitions are editable limits in a config file. Afailurein one of the FastQC metric does not immediately indicate that the data is bad, nor is that an indication that analysis needs to stop before all green check marks are obtained.Always keep the context of the experiment in mind when interpreting FastQC results. Default limits/metrics defined in FastQC config file are for "plain" genome sequencing, so a number of other types of experiments (ChIPseq, ATACseq, RNAseq) can result in some test in FastQC "failing".
Thank you for the reply!
what is the phred score distribution?
above 30 throughout