Hi everyone,
I’m analyzing a Ribo-seq breast tissue dataset (GSE210793) and noticed a major drop in read count when trimming as paired-end compared to single-end.
Here are my results:
for sample SRR24150276
Before trimming (R1 raw):
Total Sequences 52,295,695
Total Bases 7.8 Gbp
Sequence length: 150
%GC 66
After trimming (paired-end mode):
Total Sequences 1,524
Total Bases 40.8 kbp
Sequence length 18–32
%GC 53
After trimming (R1 only, single-end mode):
Total Sequences 50,887,136
Total Bases 1.3 Gbp
Sequence length 18–32
%GC 52
When I trim the data as paired-end, almost all reads are discarded. But if I run trimming on R1 only (single-end) using the same Cutadapt parameters, I retain most reads and the FastQC results look normal.
I’ve read that in Ribo-seq, Read 2 (R2) often doesn’t carry meaningful information since Ribo-seq fragments are short (~30 nt). The following paper also suggests focusing on R1 only: https://pmc.ncbi.nlm.nih.gov/articles/PMC6066590/
Could anyone confirm if it’s okay to proceed with only R1 reads for Ribo-seq downstream analysis (alignment, P-site estimation, etc.)? Or is there a recommended way to handle this kind of paired-end data?
Thanks in advance!