Entering edit mode
6.0 years ago
sheliostrow
•
0
I am kinda new to the RNA seq field. we have an experiment, with 18 mouse samples. I used the common pipeline: FastQC, cutadapt, STAR, htseq and deseq2. for some reason, one of the 18 samples didn't align well to the mouse dataset, though it passed the QC, only 8%. all other were around 90%. I though the sample was contaminated so I blasted 2000 unmapped reads and found no match what so ever.
what could have happened?
Can you post a couple example reads from the highly problematic sample? What sequencer were these run on? At the end of the day figuring out what went wrong is mostly to avoid that happening next time. For this dataset, you'll want to just exclude the sample with 8% alignment.
sample reads:
the sequencer was the Illumina NextSeq500
you are right about moving on with the analysis not including the problematic sample. but, as you said, i want to know what went wrong and in which stage.
The first read is from a mouse, the second seems to be nothing known. Note that NextSeqs produce a bit more noise than the other Illumina sequencers, but since you don't seem to have GGGG stretches I suspect that's not the culprit here.
another thing is that in the fastqc report i see a low CG percentage - 36%, and a graph of Per base sequence content https://ibb.co/fjTw8c
You need to post the image somewhere and then link to it.
https://ibb.co/fjTw8c
As long as the other samples looked similar that's fine. The bit at the beginning looks like the normal "random hexamer priming" effect.
the other samples look like this:
https://ibb.co/dtCdMx