Trimming RNAseq data for transcriptome assembly
1
0
Entering edit mode
7 weeks ago

Hey everyone,

I downloaded the Rna seq data from the organism of interest and my goal is to produce a de novo transcriptome assembly. When I ran fastqc on the raw reads I get a warning or failure on the "per base sequence content" module(the problem is detected at the first 15 bases

AND I get checkmark on the adapter content..I could use trimmomatic with the headcrop parameter, but I think that's not an efficient way(too much info lost). Can you suggest me an efficient way of getting the checkmark on this module? (without cropping all the reads)

Thank you

Rna-seq • 367 views
1
Entering edit mode

Also to note is that the values in the first ten positions are not binned, after which they become an average of 2 positions.

0
Entering edit mode

This is a common case. I would trim 5` region and map them if alignments are improved after trimming.

2
Entering edit mode
7 weeks ago

This has been asked here multiple times before :)

in a nutshell: FastQC is historically not meant to check RNAseq data, it was intended for DNAseq so some of it's checks are suboptimal when dealing with RNAseq data. Where this one is exactly an example of such a case.

bottom line: no need to worry about the 'variability' in the beginning of the read , that's normal (moreover tha graph also suffers from binning, from base 10 onwards it's plotted in a binned manner, below 10 it's per base).

Adapters you are best to trim those ones of as you for sure know they are not part of your assembly and should thus never be present in your final assembly result.

3
Entering edit mode