Question: Pair-end sequences assembly with Trinity (data mining doubt)
gravatar for guillermo.ponz.segrelles
4.9 years ago by

Hello everyone, I'm a Master student from Spain and this is my very first job with bioinformatics and RNA-Seq, so please excuse me if my questions are too easy or are not very clearly explained.

I have two files containing 100pb pair-end reads from Illumina RNA-seq and I want to assembly them into a De Novo transcriptome using Trinity. Up to here everything is OK, but I have some doubts about the process of combining the two files (containing the F and the R reads) to obtain a “consensus” sequence for the k-mer dictionary construction and the downstream processes (actually I don’t really know if such a “consensus” sequence is formed or not when you perform the Inchworm algorithm of the assembly).

My main doubt is if the F and the R reads of the 100pd fragment need to be of the same length. I wonder this because the first 9-10 bases of each read have poor per sequence position quality, and if I trim them I don’t know if it’s going to be a disaster (because I don’t know if Trinity align the F and R reads or if it just transforms the R reads to their reverse complementary and obtains the k-mers from the F and the R-transformed reads independently).

I know it’s a bit messy but I will be very grateful if anyone can help me.

rna-seq next-gen assembly • 1.3k views
ADD COMMENTlink modified 4.9 years ago by seta1.2k • written 4.9 years ago by guillermo.ponz.segrelles0
gravatar for seta
4.9 years ago by
seta1.2k wrote:

Hi, if your two file are R and F data,separately, you can combine them or not based on trinity commands. you should evaluate your quality data and try to trim the poor quality bases at the first, having poor quality base at the beginning of read is normal and can trim them to have the better assembly, no worry about it. 

ADD COMMENTlink written 4.9 years ago by seta1.2k

Thank you so much for answering Seta! It´s difficult to know what you are really doing and how it affect your results when working with this huge data sets.

I think I will trim the beginning of the reads in both files (F and R reads) with the FastQ trimmer per column tool implemented in Galaxy and then run Trinity. Hope it helps improving the results.

Thank you again.

ADD REPLYlink written 4.9 years ago by guillermo.ponz.segrelles0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1576 users visited in the last hour