Hi all,
Relatively new to bioinformatics and want to take FASTQ files from ENA for a study that had 5 controls and 12 tumour samples. It seems to be that the control sample runs are paired end data with 2 fasta files whilst the 12 tumour samples are only single-end reads. I can't see any reference in the study to this. In terms of analysing the data would this change the pipeline for performing the analysis as I want to use FASTQC, Salmon and then read in to carry out differential expression of transcripts rather than genes between the tumour samples and controls?
Can you post the accession number for this data? It would be very unusual to see control samples being run a different way. Or are you trying to use two separate submissions?
Hi, ENA accession number is SRP392517. Thanks!
You can see metadata for the entire series here: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP392517&o=acc_s%3Aa
This is rather unusual. Paired samples are 300 bp as opposed to single which are 51 bp. Not sure why it has been done this way. Assuming that there are no batch effects (e.g. paired samples done on a different chemistry/sequencer etc) your best bet may be to cut down R1 from the 300 bp samples to 51 bp and then treat all samples as single-end when analyzing the data.
Thanks for both your help I will do that!