Question

RNA-Sequencing Analysis when control samples paired-end and tumour samples single-end

0

Entering edit mode

9 weeks ago

h • 0

Hi all,

Relatively new to bioinformatics and want to take FASTQ files from ENA for a study that had 5 controls and 12 tumour samples. It seems to be that the control sample runs are paired end data with 2 fasta files whilst the 12 tumour samples are only single-end reads. I can't see any reference in the study to this. In terms of analysing the data would this change the pipeline for performing the analysis as I want to use FASTQC, Salmon and then read in to carry out differential expression of transcripts rather than genes between the tumour samples and controls?

ENA rna-sequencing • 691 views

ADD COMMENT • link 9 weeks ago by h • 0

0

Entering edit mode

It seems to be that the control sample runs are paired end data with 2 fasta files whilst the 12 tumour samples are only single-end reads.

Can you post the accession number for this data? It would be very unusual to see control samples being run a different way. Or are you trying to use two separate submissions?

ADD REPLY • link 9 weeks ago by GenoMax 152k

0

Entering edit mode

Hi, ENA accession number is SRP392517. Thanks!

ADD REPLY • link 9 weeks ago by h • 0

1

Entering edit mode

You can see metadata for the entire series here: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP392517&o=acc_s%3Aa

This is rather unusual. Paired samples are 300 bp as opposed to single which are 51 bp. Not sure why it has been done this way. Assuming that there are no batch effects (e.g. paired samples done on a different chemistry/sequencer etc) your best bet may be to cut down R1 from the 300 bp samples to 51 bp and then treat all samples as single-end when analyzing the data.

ADD REPLY • link 9 weeks ago by GenoMax 152k

0

Entering edit mode

Thanks for both your help I will do that!

ADD REPLY • link 9 weeks ago by h • 0

score 0 · Answer 1 · 2025-05-09

Public datasets often lack proper annotations and methods descriptions, hapering reliability. You should contact the authors to ensure that these samples were processed the same in terms of batch. If not then the single-end vs paired-end thing is the least of your problems. If only the sequencing mode differs and libraries were prepped same day/batch/everything then simply discard R2 and treat paired- as single-end. If different batches then you formally cannot compare as batch and biological effects are nested.