scRNA - Illumina naming convention unclear
1
0
Entering edit mode
7 weeks ago
kristina • 0

Dear all, maybe someone can help me on this matter:

Now, I would like to start the nf-core scRNA pipeline (https://nf-co.re/scrnaseq/3.0.0/docs/usage/), and for this I need to write the sample sheet following a specific naming convention. The Bioproject references that the library used is "paired reads" - and here the confusion starts. How do I write the sample sheet by following the usual Illumina naming convention? My assumption is the following:

  • S = is always the same for a given sample
  • L001 for run 1 and L002 for run 2 and L003 for Run 3 --> so the last number changes per spot read?
  • The main confusion is with the R1 and R2, since I have 3 runs per experiment? I would really appreciate some help in this. Many thanks
scRNA • 541 views
ADD COMMENT
0
Entering edit mode
7 weeks ago
GenoMax 149k

S refers to the location of a sample in a particular row in the samplesheet used for demultiplexing. It does not have any other significance.

L001 is not for run. It is for the lane the sample file came from. Samples generally run as a pool across multiple lanes so the data would be the identical for a sample even if you have lane specific files. cellranger (and other software) should understand what this means.

For single cell (10x) R1 contains UMI and cell barcodes. R2 is the RNA read.

ADD COMMENT
0
Entering edit mode

Thank you very much for the response. However, my confusion is that I cannot really indentify or understand, which one is supposed to be the forward and which one the backward read (R1 / R2). After looking into the files with zcat | head, I just got even more confused; below a screen shot of 3 file runs (same patient) - I assumed that one might be the index file...enter image description here

ADD REPLY
0
Entering edit mode

File with 8-10 bp reads (above) is illumina index. That is not useful for anything downstream. (should be SRA*_1.fastq file as in example above)

File with 26-28 bp reads should be equivalent to R1 read i.e. cell barcodes + UMI. (this should be SRA*_2.fastq file). Sometime people may sequence R1 out to 100 bp (same as R2). In that case you should be able to spot the poly-T tail that follows UMI+cell barcodes (ref https://kb.10xgenomics.com/hc/en-us/articles/360035999892-What-is-the-structure-of-the-final-Visium-for-fresh-frozen-library )

Read that is 90-100 bp should be the actual RNA read. (this should be SRA*_3.fastq file).

You may need to rename the files as noted by 10x after getting the data from SRA: https://kb.10xgenomics.com/hc/en-us/articles/115003802691-How-do-I-prepare-Sequence-Read-Archive-SRA-data-from-NCBI-for-Cell-Ranger

ADD REPLY

Login before adding your answer.

Traffic: 3730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6