Question: Kallisto | bustools workflow for fastq with R1 and R2 (forward and reverse)?
9 months ago by
F. Golestan60
F. Golestan60 wrote:


I want to use Kallisto | bustools workflow for single-cell RNA-seq analysis. However, I have paired-end fastq files where each sample has two files R1 and R2 (forward and reverse). But, it seems that Kallisto | bustools accepts a single file of sequences for each sample.

I was wondering if there is any way to run Kallisto | bustools workflow for fastq files with two files R1 and R2?

I would highly appreciate any advice and help in this regards.

Best wishes,


modified 9 months ago by h.mon30k • written 9 months ago by F. Golestan60

Kallisto | bustools site shows R1/R2 reads being used in their example.

kallisto bus -i Mus_musculus.GRCm38.cdna.all.idx -o bus_output/ -x 10xv2 -t 4 SRR8599150_S1_L001_R1_001.fastq.gz SRR8599150_S1_L001_R2_001.fastq.gz
written 9 months ago by genomax85k

Thank you very much for your guide. Yes you are right, in their example, they used file names of SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz. They also mentioned that they used "mouse retinal cells SRR8599150 dataset from Koren et al., 2019". However, I downloaded these example dataset from ENA database ( and I got SRR8599150.fastq.gz and SRR8599151.fastq.gz files. I do not know why the name of their used files is different from the one that I got from ENA.

Also, in the ENA database, It does not mention R1 and R2 for the same Run accession number. Instead, it shows two files with two different accession numbers (SRR8599150 and SRR8599151), each with only one File 1 as FASTQ files (FTP) for download. While in the Kallisto | bus example, they are named as SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz which I can not see in ENA.

I will be grateful if you could guide and clarify me in this regard.

Best wishes, Farah

written 9 months ago by F. Golestan60

Using the run selector it looks like there are two separate accession numbers listed for SRR8599150. If you use two GEO accession numbers then there are two separate records in ENA that show up. Record 1 and Record 2. One is marked as ctrl and other LD.

Lior Pachter (author of kallisto) participates on biostars so he may come across this thread and respond as to the mystery of mismatch between the example shown on kallisto site and data in SRA/ENA.

written 9 months ago by genomax85k

Ok. Many thanks for your reply and guide. I also tagged Lior Pachter. Thanks.

written 9 months ago by F. Golestan60

I am not able to dump the data from SRA (looks like this data may have become cloud only?) so your best bet is ENA. I hope NCBI is not jumping the gun here with the cloud stuff locking people out of datasets. There was supposed to be a formal announcement before the roll out. If that happened then I must have missed it.

written 9 months ago by genomax85k

Oh Ok. Thank you for letting me know. I will use ENA for downloading the data.

written 9 months ago by F. Golestan60

Tagging: Lior Pachter

written 9 months ago by genomax85k
