I downloaded scMultiomics data from here.
To be specific, I downloaded snATAC-seq from here.
I made an ATAC folder and downloaded all snATAC-seq files to this ATAC folder using the wget command below.
cd ATAC/
wget -O W71_LUNGrep2_S6_L001_R1_001.fastq.gz https://www.encodeproject.org/files/ENCFF872EFS/@@download/ENCFF872EFS.fastq.gz
wget -O W71_LUNGrep2_S6_L001_R2_001.fastq.gz https://www.encodeproject.org/files/ENCFF320GWZ/@@download/ENCFF320GWZ.fastq.gz
wget -O W71_LUNGrep2_S6_L001_R3_001.fastq.gz https://www.encodeproject.org/files/ENCFF260JLZ/@@download/ENCFF260JLZ.fastq.gz
wget -O W71_LUNGrep2_S6_L002_R1_001.fastq.gz https://www.encodeproject.org/files/ENCFF591VEX/@@download/ENCFF591VEX.fastq.gz
wget -O W71_LUNGrep2_S6_L002_R2_001.fastq.gz https://www.encodeproject.org/files/ENCFF979SWK/@@download/ENCFF979SWK.fastq.gz
wget -O W71_LUNGrep2_S6_L002_R3_001.fastq.gz https://www.encodeproject.org/files/ENCFF213UEY/@@download/ENCFF213UEY.fastq.gz
Then downloaded scRNA-seq from here.
I made an RNA folder and downloaded all scRNA-seq files to this RNA folder using the wget command below.
cd RNA/
wget -O W71_LUNGrep2_S6_L002_R1_001.fastq.gz https://www.encodeproject.org/files/ENCFF094PRI/@@download/ENCFF094PRI.fastq.gz
wget -O W71_LUNGrep2_S6_L002_R2_001.fastq.gz https://www.encodeproject.org/files/ENCFF639HEH/@@download/ENCFF639HEH.fastq.gz
wget -O W71_LUNGrep2_S6_L001_R1_001.fastq.gz https://www.encodeproject.org/files/ENCFF135JSP/@@download/ENCFF135JSP.fastq.gz
wget -O W71_LUNGrep2_S6_L001_R2_001.fastq.gz https://www.encodeproject.org/files/ENCFF318BVV/@@download/ENCFF318BVV.fastq.gz
Noticing that the filename is from the “original filename” field in the attribution table. For example, for part of the sequencing file of snATAC-seq data ENCFF872EFS, I navigate to https://www.encodeproject.org/files/ENCFF872EFS/, and then I can find W71_LUNGrep2_S6_L001_R1_001.fastq.gz as the filename of this sequencing file. Check the screen shot below.
A tricky point is that if you check ENCFF320GWZ and ENCFF260JLZ, you will find that in the page, ENCFF320GWZ is R2, and ENCFF260JLZ is the index file, but in their own pages -- ENCFF320GWZ and ENCFF260JLZ -- ENCFF320GWZ is R3, and ENCFF260JLZ is R2. However, I tried both order (ENCFF320GWZ is R2 and ENCFF260JLZ is R3, or ENCFF320GWZ is R3 and ENCFF260JLZ is R2) and the cellranger arc returned the same errors.
Then I built a libraries.csv file as below
fastqs,sample,library_type
${root_dir}$/ENCSR128ZLB/RNA,W71_LUNGrep2,Gene Expression
${root_dir}$/ENCSR128ZLB/ATAC,W71_LUNGrep2,Chromatin Accessibility
For both scRNA-seq and snATAC-seq files, I extract the string before S index in their original sequencing filename, but they are the same, so would this trigger any error?
So, now my folder and file structure is
.
|-- ATAC
| |-- W71_LUNGrep2_S6_L001_R1_001.fastq.gz
| |-- W71_LUNGrep2_S6_L001_R2_001.fastq.gz
| |-- W71_LUNGrep2_S6_L001_R3_001.fastq.gz
| |-- W71_LUNGrep2_S6_L002_R1_001.fastq.gz
| |-- W71_LUNGrep2_S6_L002_R2_001.fastq.gz
| |-- W71_LUNGrep2_S6_L002_R3_001.fastq.gz
|-- libraries.csv
|-- RNA
|-- W71_LUNGrep2_S6_L001_R1_001.fastq.gz
|-- W71_LUNGrep2_S6_L001_R2_001.fastq.gz
|-- W71_LUNGrep2_S6_L002_R1_001.fastq.gz
|-- W71_LUNGrep2_S6_L002_R2_001.fastq.g
I used the following command to run cellranger arc on these data.
cellranger-arc count --id=2024_A \
--reference=${reference_dir}/refdata-cellranger-arc-GRCh38-2024-A \
--libraries=${work_root_dir}/libraries.csv \
--localcores=24 \
--localmem=180
After running about 1h 50min, cellranger arc returned the following error:
025-11-25 01:59:31 [runtime] (update) ID.2024-A.SC_ATAC_GEX_COUNTER_CS.SC_ATAC_GEX_COUNTER._GEX_MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0 chunks running (7/21 completed)
2025-11-25 02:04:46 [runtime] (update) ID.2024-A.SC_ATAC_GEX_COUNTER_CS.SC_ATAC_GEX_COUNTER._GEX_MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0 chunks running (11/21 completed)
2025-11-25 02:10:06 [runtime] (update) ID.2024-A.SC_ATAC_GEX_COUNTER_CS.SC_ATAC_GEX_COUNTER._GEX_MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0 chunks running (13/21 completed)
2025-11-25 02:11:22 [runtime] (failed) ID.2024-A.SC_ATAC_GEX_COUNTER_CS.SC_ATAC_GEX_COUNTER._ATAC_MATRIX_COMPUTER.ALIGN_ATAC_READS
[error] Pipestance failed. Error log at:
2024-A/SC_ATAC_GEX_COUNTER_CS/SC_ATAC_GEX_COUNTER/_ATAC_MATRIX_COMPUTER/ALIGN_ATAC_READS/fork0/join-u6dec253e8d/_errors
Log message:
0.5% (< 10%) of read pairs have a valid 10x barcode. This could be a result of poor sequencing quality, a sample mixup, or running the wrong pipeline, for example, running `cellranger-atac` on Multiome AT
AC + GEX data, or vice versa.
Waiting 6 seconds for UI to do final refresh.
Pipestance failed. Use --noexit option to keep UI running after failure.
2025-11-25 02:11:28 Shutting down.
Do I need to upload full output file? Does this error mean there is any low quality issue for the data themselves which means that this set of data is useless? Or I did anything wrong? May I have your suggestions? Thank you very much!
Check the read lengths to make sure the files are correct.
Hi Arup Ghosh , thank you very much for your suggestions, and here are the results -- they seems correct. May I have your suggestions? Thank you very much!
The barcode files with 24nt read length W71_LUNGrep2_S6_L001_R3_001.fastq.gz and W71_LUNGrep2_S6_L002_R3_001.fastq.gz should be the R2.