Question: Trimming single end reads for STAR?
2
gravatar for caggtaagtat
2.4 years ago by
caggtaagtat1.1k
caggtaagtat1.1k wrote:

Hi,

I just started to work with single end reads, which are already trimmed for adapter sequences and quality. Do I have to trimm the reads now to the same length of e.g. 100nt for mapping them with STAR? Is there a negative effect, if I don't?

rna-seq star trimming • 2.3k views
ADD COMMENTlink modified 2.4 years ago by h.mon30k • written 2.4 years ago by caggtaagtat1.1k
4
gravatar for grant.hovhannisyan
2.4 years ago by
grant.hovhannisyan2.0k wrote:

If the qualities are ok and there are no adapters you can proceed with mapping. There is a recent paper about trimming of RNAseq data and its possible consequence on downstream analysis - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4766705/

ADD COMMENTlink written 2.4 years ago by grant.hovhannisyan2.0k

Thank you! I will proceed with the mapping than.

ADD REPLYlink written 2.4 years ago by caggtaagtat1.1k
2
gravatar for h.mon
2.4 years ago by
h.mon30k
Brazil
h.mon30k wrote:

If they are already trimmed for adapters and quality, don't trim more. Trimming will make sequences shorter, and shorter sequences tend to map more to multiple locations.

What is the length range of your reads? I generally keep reads only within a certain range, and discard the shorter reads. For example, for a 100bp dataset, I keep reads from 70-100bp after trimming, and discard the rest.

ADD COMMENTlink written 2.4 years ago by h.mon30k

That makes sense! My reads are 40-155nt long.

Here is a plot of the percentage I would discard vs the possible minimal read length. Would a minimal length of 80nt be appropriate?

https://ibb.co/gLZ7q7

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by caggtaagtat1.1k
1

80 seems reasonable. What is the organism? Also, if you used trimmomatic for trimming then it has an option to remove trimmed reads shorter than given value.

ADD REPLYlink written 2.4 years ago by grant.hovhannisyan2.0k

Ok thank you. The reads were obtained from human cardiovascular endothelial cells. Thank you, I was going to use trimmomatic :)

ADD REPLYlink written 2.4 years ago by caggtaagtat1.1k

50bp should be fine for counting applications for human genome. You may be throwing good data away by being too strict.

ADD REPLYlink written 2.4 years ago by genomax85k

Ok, but since I do analysis of alternative splicing, I will stick with a minimal lenght of 75nts for now. I read somewhere in this forum, that reads schould not be shorter than 70nt for isoform analysis

ADD REPLYlink written 2.4 years ago by caggtaagtat1.1k

That sounds reasonable. Curious why you did not choose to do paired-end sequencing to get spatial information in that case.

ADD REPLYlink written 2.4 years ago by genomax85k

I was told that using single-end sequencing would be better for doing splicing analysis, althoug I can't remember why . Besides, I was not included in that desicion and would maybe also guess financial reasons ;)

ADD REPLYlink written 2.4 years ago by caggtaagtat1.1k

Sufficient makes sense rather than better. The financial reason angle is always critical :-)

ADD REPLYlink written 2.4 years ago by genomax85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1538 users visited in the last hour