Question: Paired end sequencing with different read length: better to trim everything to a short length or use long single end
gravatar for Lalla
2.5 years ago by
Lalla40 wrote:

Dear all,

I am working on alternative splicing. I have paired-end sequencing data of different length (100 bp for the forward and 66 for the reverse, trimmed because of low quality). My problem is that some tools for alternative splicing, such as MISO or rMATS, require reads of the same length. I could trim the reads to get the same size, and use both forward and reverse, but in this case all the reads will be quite short (66 bp). Alternatively I can use only the forward reads, which are considerably longer (100 bp). I searched a bit but I could not find a clear answer on which strategy would be better. To my knowledge is better to not use <70 bp reads for alternative splicing detection, but I find papers that publish alternative splicing data with 50bp reads single ends.

Any advice would be highly appreciated.


ADD COMMENTlink modified 13 months ago by iti.gupta10 • written 2.5 years ago by Lalla40

What Q score threshold did you trim the reads at? You may want to go back to the original reads and try them to see if they still work. When you have a reference genome you can afford to use reads with less than optimal quality.

ADD REPLYlink written 2.5 years ago by genomax80k

Dear genomax

thank you for your reply. Unfortunately I do not know. I am not a bioinformatician, and the quality trimming, mapping and alignment has been done by our in house bioinformaticians. We work on mouse, so I believe that if they decided to trim those reads it was for a good reason and I wouldn't trust low quality reads when it comes to alternative splicing detection.

Thank you anyway

ADD REPLYlink written 2.5 years ago by Lalla40

If MISO or rMATS require reads of same length then you don't have an option but to trim R1 to same length as R2.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax80k

I agree with genomax. In addition, nobody prevents you from BOTH trimming read one to 66bp and using the R1 as 100bp single read and integrate the results, so that you get the most possible information. Finally... I am pretty sure that tools such as tophat-cufflinks (and probably the newer Hisat, as well) can align paired reads of different lengths and detect different isoform, so you might also try to work in that direction.

ADD REPLYlink written 2.5 years ago by Fabio Marroni2.5k

Ok, I will try both ways then. Thanks for the suggestions!

ADD REPLYlink written 2.5 years ago by Lalla40

Is it mandatory that the read length should be same even when we are working with BAM files? (in case of rmats)

ADD REPLYlink written 13 months ago by iti.gupta10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1839 users visited in the last hour