Kallisto | bustools workflow for fastq with R1 and R2 (forward and reverse)?
0
0
Entering edit mode
4.6 years ago
Farah ▴ 80

Hello,

I want to use Kallisto | bustools workflow for single-cell RNA-seq analysis. However, I have paired-end fastq files where each sample has two files R1 and R2 (forward and reverse). But, it seems that Kallisto | bustools accepts a single file of sequences for each sample.

I was wondering if there is any way to run Kallisto | bustools workflow for fastq files with two files R1 and R2?

I would highly appreciate any advice and help in this regards.

Best wishes,

Farah

scRNA-seq Kallisto bustools • 2.5k views
ADD COMMENT
1
Entering edit mode

Kallisto | bustools site shows R1/R2 reads being used in their example.

kallisto bus -i Mus_musculus.GRCm38.cdna.all.idx -o bus_output/ -x 10xv2 -t 4 SRR8599150_S1_L001_R1_001.fastq.gz SRR8599150_S1_L001_R2_001.fastq.gz
ADD REPLY
0
Entering edit mode

Thank you very much for your guide. Yes you are right, in their example, they used file names of SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz. They also mentioned that they used "mouse retinal cells SRR8599150 dataset from Koren et al., 2019". However, I downloaded these example dataset from ENA database (https://www.ebi.ac.uk/ena/data/view/PRJNA523252) and I got SRR8599150.fastq.gz and SRR8599151.fastq.gz files. I do not know why the name of their used files is different from the one that I got from ENA.

Also, in the ENA database, It does not mention R1 and R2 for the same Run accession number. Instead, it shows two files with two different accession numbers (SRR8599150 and SRR8599151), each with only one File 1 as FASTQ files (FTP) for download. While in the Kallisto | bus example, they are named as SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz which I can not see in ENA.

I will be grateful if you could guide and clarify me in this regard.

Best wishes, Farah

ADD REPLY
0
Entering edit mode

Using the run selector it looks like there are two separate accession numbers listed for SRR8599150. If you use two GEO accession numbers then there are two separate records in ENA that show up. Record 1 and Record 2. One is marked as ctrl and other LD.

Lior Pachter (author of kallisto) participates on biostars so he may come across this thread and respond as to the mystery of mismatch between the example shown on kallisto site and data in SRA/ENA.

ADD REPLY
0
Entering edit mode

Ok. Many thanks for your reply and guide. I also tagged Lior Pachter. Thanks.

ADD REPLY
0
Entering edit mode

I am not able to dump the data from SRA (looks like this data may have become cloud only?) so your best bet is ENA. I hope NCBI is not jumping the gun here with the cloud stuff locking people out of datasets. There was supposed to be a formal announcement before the roll out. If that happened then I must have missed it.

ADD REPLY
0
Entering edit mode

Oh Ok. Thank you for letting me know. I will use ENA for downloading the data.

ADD REPLY
0
Entering edit mode

Tagging: Lior Pachter

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6