Dear all, maybe someone can help me on this matter:
- I have downloaded scRNA data from here: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA984257&o=experiment_s%3Aa%253Bacc_s%253Bacc_s%253Bacc_s%3Bacc_s%3Aa using the SRA toolkit in bulk using the following command (cat SRR_Acc_List.txt | xargs -I{} bin/fastq-dump -I --split-files {})
- As one can see, the scRNA data have 3 runs per sample
- As each of the RUNs has four reads per spot, of course when downloading the data with the SRA comand - I get four files; so far, all good.
Now, I would like to start the nf-core scRNA pipeline (https://nf-co.re/scrnaseq/3.0.0/docs/usage/), and for this I need to write the sample sheet following a specific naming convention. The Bioproject references that the library used is "paired reads" - and here the confusion starts. How do I write the sample sheet by following the usual Illumina naming convention? My assumption is the following:
- S = is always the same for a given sample
- L001 for run 1 and L002 for run 2 and L003 for Run 3 --> so the last number changes per spot read?
- The main confusion is with the R1 and R2, since I have 3 runs per experiment? I would really appreciate some help in this. Many thanks
Thank you very much for the response. However, my confusion is that I cannot really indentify or understand, which one is supposed to be the forward and which one the backward read (R1 / R2). After looking into the files with
zcat | head
, I just got even more confused; below a screen shot of 3 file runs (same patient) - I assumed that one might be the index file...File with 8-10 bp reads (above) is illumina index. That is not useful for anything downstream. (should be SRA*_1.fastq file as in example above)
File with 26-28 bp reads should be equivalent to R1 read i.e. cell barcodes + UMI. (this should be SRA*_2.fastq file). Sometime people may sequence R1 out to 100 bp (same as R2). In that case you should be able to spot the poly-T tail that follows UMI+cell barcodes (ref https://kb.10xgenomics.com/hc/en-us/articles/360035999892-What-is-the-structure-of-the-final-Visium-for-fresh-frozen-library )
Read that is 90-100 bp should be the actual RNA read. (this should be SRA*_3.fastq file).
You may need to rename the files as noted by 10x after getting the data from SRA: https://kb.10xgenomics.com/hc/en-us/articles/115003802691-How-do-I-prepare-Sequence-Read-Archive-SRA-data-from-NCBI-for-Cell-Ranger