Question

Sequence Duplication Levels

0

Entering edit mode

3.3 years ago

fmazzio1 ▴ 10

Hello everyone, I am struggling with some bulk RNAseq data. I've been given the count matrix and metadata of bulk RNAseq which was run onto sorted T cells. Now the issue is that we unfortunately did not find what we would expect. I know that this happens most of the time in research, however I am looking if something went wrong during the sequencing or even before. Who run the analysis shared an excel file of their results report and I've noticed that for 10 out of 109 samples they had RIN values < 6 (3 of which had RIN = 1 and 1 had a RIN = 2.5). Nonetheless they procedeed with the sequencing for all of our samples. Do you think I should remove those samples? I've run a PCA to see if the majority the variance is associated with RIN score, but it does not seem so. Additionally I've looked at the multiQC report and I'd like to know if the plot of sequence duplication levels should be a concern or not.

Studying a bit about multiQC and FASTAQC report it's a bad output, but how bad?

Thanks

Francesco

RNA-Seq • 1.1k views

ADD COMMENT • link 3.3 years ago by fmazzio1 ▴ 10

0

Entering edit mode

Have you looked at this post from authors of FastQC?

Now the issue is that we unfortunately did not find what we would expect.

If your data does not speak for itself then no amount of finagling is going to give you an expected result. You could look through the steps and see if you can find something obvious (e.g. alignment % not uniform across samples, varying levels of adapter contamination, order of magnitude of differences in read data etc).

ADD REPLY • link 3.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you for the post you shared, the issue of duplication is very well explained. Thanks also for your suggestions, the other QC parameters seem good, my biggest concern are the RIN values.

Anyway thank you for all the help!

ADD REPLY • link 3.3 years ago by fmazzio1 ▴ 10