Question: Trimmomatic low PE returns
0
gravatar for oliver.tills
4.3 years ago by
oliver.tills10
European Union
oliver.tills10 wrote:

Hi,

I have a query about using Trimmomatic on PE MiSeq data. I am receiving what seems to be quite low paired returns. Is this typical and/or can anyone suggest what the problem might be  -- Input Read Pairs: 982783 Both Surviving: 346732 (35.28%) Forward Only Surviving: 635840 (64.70%) Reverse Only Surviving: 17 (0.00%) Dropped: 194 (0.02%). See the code run below. The problem does appear to be the adapter trimming rather than the quality trimming (as if I remove the adapter trimming step PE returns increases to > 60 %. 

Can anyone offer any advice?

Thanks, Oli

TrimmomaticPE: Started with arguments: /home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_1.sanfastq.gz \

/home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_2.sanfastq.gz

/home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_1.sanfastq.fq.gz \

/home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_1.sanfastq.bd.fq.gz \

/home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_2.sanfastq.fq.gz

/home/otills/data/all/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_2.sanfastq.bd.fq.gz \

ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:2 MINLEN:20

Multiple cores found: Using 8 threads

Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'

ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Quality encoding detected as phred33

Input Read Pairs: 982783 Both Surviving: 346732 (35.28%) Forward Only Surviving: 635840 (64.70%) Reverse Only Surviving: 17 (0.00%) Dropped: 194 (0.02%)

TrimmomaticPE: Completed successfully

rna-seq • 3.0k views
ADD COMMENTlink modified 4.2 years ago • written 4.3 years ago by oliver.tills10
1

Run fastqc on the files and see if anything is obviously wrong with them.

ADD REPLYlink written 4.2 years ago by Devon Ryan86k
1
gravatar for oliver.tills
4.2 years ago by
oliver.tills10
European Union
oliver.tills10 wrote:

I don't think there is anything obviously wrong with the files - https://www.dropbox.com/s/gz49ljhbxs95ont/140331_C1CR_M01145_0120_000000000-A6UJV_1_IL-TP-014_1.sanfastq_fastqc.html?dl=0.

However, I think I've figured out what is happening. The adapter trimming is running in 'palindrome' mode and therefore - 'After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying "true‟ for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads.'.

If I set this to TRUE I get ~99 % paired reads surviving. I am not sure I fully understand what is happening with this, but I guess because the MiSeq reads are quite long (250 bp) I am getting high levels of adapter read through.

ADD COMMENTlink written 4.2 years ago by oliver.tills10

Interesting observation, thanks for sharing. Your data though has plenty of things going on, lots of adapter contamination but clearly not at the level to remove so much of it. 

The fact that Trimmomatic would drop palindromic reads is pretty crazy and I've never realized that myself. Usually I don't even care about palindromes. The palindromic behavior is a nice trick to detect very short read through but removing data should be explicit specified by the user.

The reads are not incorrect, perhaps redundant but the solution is not to destroy pairing information.

 

ADD REPLYlink written 4.2 years ago by Istvan Albert ♦♦ 78k

Wow, I never knew about that palindromic option (I don't normally use trimmomatic), the default should really be true for that.

ADD REPLYlink written 4.2 years ago by Devon Ryan86k

Two questions linked to this - i) is this situation normal for pe MiSeq reads, and ii) what are people's opinions on the best way to prepare these data for assembly and mapping?

ADD REPLYlink written 4.2 years ago by oliver.tills10

When reads overlap substantially the best may be to merge them into a single read (using tools like Flash and many others). A typical paired end data has a "gap" in between, once reads overlap each fragment will produce regions that are doubly and redundantly measured from two reads and regions that are covered only once from the same read. This will alter the assumptions of just about any mathematical or statistical model that downstream tools may rely on. In that case it is best to merge the reads.

On the other hand there are tools that require paired end reads as input. 

Long story short data analysis gets a little more complicated as if it wasn't already.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Istvan Albert ♦♦ 78k
0
gravatar for oliver.tills
4.2 years ago by
oliver.tills10
European Union
oliver.tills10 wrote:

Two questions linked to this - i) is this situation normal for pe MiSeq reads, and ii) what are people's opinions on the best way to prepare these data for assembly and mapping?

ADD COMMENTlink written 4.2 years ago by oliver.tills10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour