Trimming Adapters for RNA-SEQ (Quality Check)
Entering edit mode
2.6 years ago
Sammy ▴ 20


I am processing publicly available SRA. I developed the pipeline, however I skipped the FASTQC step so I have to go back. I tried to figure out a quick way to do it. Correct me if I am wrong:

Processing publicly available data has some downsides: you don't know loads of things about the RNA isolation protocol or the sequencing step. So it is hard to find the adapters for the quality check. I looked into some tools:

  1. Trimmomatic has ILLUMINACLIP. Does it work? I still don't understand properly how it identifies adapters. Additionally I am processing data from a large number of papers. Often the sequencing technology is not even mentioned.

  2. CutAdapt - for that one I need to insert the sequence.

...and there are loads of other tools as well.

But I figured that would be much easier to do the following:

-use Trimmomatic Operation SLIDINGWINDOW (which is cutting once the average quality within the window falls below a threshold)

-for pair-end sequences (PE) (and for single-end (SE)), the adapters are sequenced towards one end '3 or '5 (and that end is usually '3 where the quality is starting to drop)

Isn't that an indicator of the adapters? Do I really need to know more about it? It looks like an easy uncomplicated way of repeating my whole batch. Is this enough for the quality step (maybe some filtering depending on the data)? Am I missing something?


RNA-Seq • 1.1k views
Entering edit mode
2.6 years ago
ATpoint 62k

Actually it is pretty trivial. Run fastqc which will tell you if there are adapter contaminations or not. If not, your are fine. If so, you simply trim the sequence it identified. In 99% of cases for standard RNA-seq it is the normal TruSeq adapter which you could look up by googling it (there are many posts addressing this) or checking the Illumina documents or checking what the trimming tools suggest. The standard adapter is AGATCGGAAGAGC both for single and paired-end. Cutadapt or any other tool will do fine. There is nothing magical about trimming.

I agree that I always prefer my own samples over published ones but published data has a lot of value. You can confirm (or negate, or put into perspective) your results. For my project I downloaded plenty of available data and benefitted a lot from it. Typically the method sections contain enough information to adequately process data but you always must be critical. Check the lowlevel metrics with fastqc, be sure samples have good mapping rates and cluster reasonably in e.g. a PCA. There are many good datasets out there, still some are problematic or even bad/unacceptable. It is on you to set up a proper pipeline to detect/decide this. Still, I prefer reanalysis over using published count tables or differential analysis results as you have no control on what exactly the authors did. Processing yourself eliminates in silico batch effects between your data and the published ones.

No, dropping quality is no indication for adapters. Don't overthink such a trivial thing such as adapter trimming. If fastqc tells you that there are adapters then trim, else don't bother yourself. Typically it is not even necessary as standard fragment length are usually longer than read length.

Entering edit mode
2.6 years ago
jrleary ▴ 190

The first poster gave a great response, but I'd posit that skewer is also a great option for read trimming. You can use a specific adapter or set of adapters as input, and it's wicked fast. The downside is that there's not a ton of documentation available, but if you decide to go down that path I can provide sample code.


Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6