Question

Trimmomatic did a massive trimming!!!

0

Entering edit mode

8.8 years ago

JoeDoasi ▴ 10

Hello everyone,

I'm new to RNA-seq data analysis..

I downloaded 6 samples (3 for every condition) and I ran QC on them and had to trim the adaptors and poor reads.. what happened is that 3 replicates of one condition received (aggressive trimming) and removed around 15% of reads as an average. while 7% of reads have been thrown from the other condition replicates.

now removal of 15% will affect my downstream analysis, so is there any recommendations to improve the trimming procedure or this normal because of issues in the original row data ??

my trimmomatic command:

java -jar ~/Desktop/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 SRR11771_1.fastq SRR11771_2.fastq TRMD_SRR11771_1_paired.fastq TRMD_SRR11771_1_unpaired.fastq TRMD_SRR11771_2_paired.fastq TRMD_SRR11771_2_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

my plan is to do exon-centric analysis using tophat and star as aligners and miso, mats and suppa for differential splicing analyses..

Your help is really appreciated.

Thanks

RNA-Seq next-gen genome sequencing alignment • 4.5k views

ADD COMMENT • link updated 5.8 years ago by MiladAD ▴ 10 • written 8.8 years ago by JoeDoasi ▴ 10

2

Entering edit mode

15% is not massive (unless you have few reads to begin with) :-)

ADD REPLY • link 8.8 years ago by GenoMax 152k

1

Entering edit mode

Your command conflates adapters and quality, so it's impossible to determine why the reads are being trimmed. I suggest you first do adapter-trimming, and then possibly do quality trimming. In general, quality-trimming decreases the accuracy of alignment, though trimming to a low level like Q6 can sometimes be beneficial.

ADD REPLY • link 8.8 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you all for sharing the useful information with me..

I will take into account the different ideas posted here.

ADD REPLY • link 8.8 years ago by JoeDoasi ▴ 10

0

Entering edit mode

in the RNAseq data analysis, You have to be careful to strike a balance between acceptable quality and also minimize the number of discarded reads. it should be noted, all the adapters contamination should be trim. I recommend you 123Fastq which combine FASTQC and trimmomatic in a highly interactive graphical user interface. it also added some improvements to QC modules of FASTQC, added a Kmer-based approach to remove adapters in the trimming, and many other features. try it your own: https://sourceforge.net/projects/project-123ngs/

ADD REPLY • link 5.8 years ago by MiladAD ▴ 10

1

Entering edit mode

Instead of posting this in multiple old threads it would be best to post an independent tools post one time. That would be proper way of announcing your software.

ADD REPLY • link 5.8 years ago by GenoMax 152k

score 3 · Answer 1 · 2016-09-10

You may not realize this, but STAR already performs soft-clipping (or internal trimming), so pre-trimming may not make much of a difference:

STAR performs the so called local alignment of the read sequence to the genome, as opposed to the end-to-end (semi-global) alignment which is performed by many DNA aligners such as bowtie1. This means that STAR will try to maximize the alignment score by "extending" the alignment towards the end of the reads. However, it will not try to force the "full-length" read alignment from the first to the last base of the read sequence.

https://groups.google.com/forum/#!msg/rna-star/uyGEc7lPveg/yJY6hmjt7REJ

See some additional discussion on RNA-seq read-trimming here: https://github.com/chapmanb/bcbio-nextgen/issues/1140

score 1 · Answer 2 · 2016-09-10

try different trimming tools and do comparison. Sometimes your raw data has many reads like "NNNNNNNNNNNNNNNNNNN", or some not paired-end reads was transfered to another file. You may want to use tophat 2 instead of tophat. In my option, 15% rubbish removal won't affect downstream analysis bcoz base on your command, it is not likely to lose information in your raw data.