Question: Trimmomatic did a massive trimming!!!
0
gravatar for joe
3.1 years ago by
joe0
joe0 wrote:

Hello everyone,

I'm new to RNA-seq data analysis..

I downloaded 6 samples (3 for every condition) and I ran QC on them and had to trim the adaptors and poor reads.. what happened is that 3 replicates of one condition received (aggressive trimming) and removed around 15% of reads as an average. while 7% of reads have been thrown from the other condition replicates.

now removal of 15% will affect my downstream analysis, so is there any recommendations to improve the trimming procedure or this normal because of issues in the original row data ??

my trimmomatic command:

java -jar ~/Desktop/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 SRR11771_1.fastq SRR11771_2.fastq TRMD_SRR11771_1_paired.fastq TRMD_SRR11771_1_unpaired.fastq TRMD_SRR11771_2_paired.fastq TRMD_SRR11771_2_unpaired.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

my plan is to do exon-centric analysis using tophat and star as aligners and miso, mats and suppa for differential splicing analyses..

Your help is really appreciated.

Thanks

ADD COMMENTlink modified 5 weeks ago by genetician201610 • written 3.1 years ago by joe0
2

15% is not massive (unless you have few reads to begin with) :-)

ADD REPLYlink written 3.1 years ago by genomax73k
1

Your command conflates adapters and quality, so it's impossible to determine why the reads are being trimmed. I suggest you first do adapter-trimming, and then possibly do quality trimming. In general, quality-trimming decreases the accuracy of alignment, though trimming to a low level like Q6 can sometimes be beneficial.

ADD REPLYlink written 3.1 years ago by Brian Bushnell16k

Thank you all for sharing the useful information with me..

I will take into account the different ideas posted here.

ADD REPLYlink written 3.1 years ago by joe0

in the RNAseq data analysis, You have to be careful to strike a balance between acceptable quality and also minimize the number of discarded reads. it should be noted, all the adapters contamination should be trim. I recommend you 123Fastq which combine FASTQC and trimmomatic in a highly interactive graphical user interface. it also added some improvements to QC modules of FASTQC, added a Kmer-based approach to remove adapters in the trimming, and many other features. try it your own: https://sourceforge.net/projects/project-123ngs/

ADD REPLYlink written 5 weeks ago by genetician201610
1

Instead of posting this in multiple old threads it would be best to post an independent tools post one time. That would be proper way of announcing your software.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax73k
2
gravatar for igor
3.1 years ago by
igor8.7k
United States
igor8.7k wrote:

You may not realize this, but STAR already performs soft-clipping (or internal trimming), so pre-trimming may not make much of a difference:

STAR performs the so called local alignment of the read sequence to the genome, as opposed to the end-to-end (semi-global) alignment which is performed by many DNA aligners such as bowtie1. This means that STAR will try to maximize the alignment score by "extending" the alignment towards the end of the reads. However, it will not try to force the "full-length" read alignment from the first to the last base of the read sequence.

https://groups.google.com/forum/#!msg/rna-star/uyGEc7lPveg/yJY6hmjt7REJ

See some additional discussion on RNA-seq read-trimming here: https://github.com/chapmanb/bcbio-nextgen/issues/1140

ADD COMMENTlink written 3.1 years ago by igor8.7k

thank you for this insight igor

ADD REPLYlink written 5 weeks ago by steve2.3k
1
gravatar for Calvin
3.1 years ago by
Calvin60
Australia
Calvin60 wrote:

try different trimming tools and do comparison. Sometimes your raw data has many reads like "NNNNNNNNNNNNNNNNNNN", or some not paired-end reads was transfered to another file. You may want to use tophat 2 instead of tophat. In my option, 15% rubbish removal won't affect downstream analysis bcoz base on your command, it is not likely to lose information in your raw data.

ADD COMMENTlink written 3.1 years ago by Calvin60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1991 users visited in the last hour