Question: Kallisto | bustools workflow for fastq with R1 and R2 (forward and reverse)?
0
gravatar for F. Golestan
6 weeks ago by
F. Golestan20
F. Golestan20 wrote:

Hello,

I want to use Kallisto | bustools workflow for single-cell RNA-seq analysis. However, I have paired-end fastq files where each sample has two files R1 and R2 (forward and reverse). But, it seems that Kallisto | bustools accepts a single file of sequences for each sample.

I was wondering if there is any way to run Kallisto | bustools workflow for fastq files with two files R1 and R2?

I would highly appreciate any advice and help in this regards.

Best wishes,

Farah

ADD COMMENTlink modified 6 weeks ago by h.mon28k • written 6 weeks ago by F. Golestan20
1

Kallisto | bustools site shows R1/R2 reads being used in their example.

kallisto bus -i Mus_musculus.GRCm38.cdna.all.idx -o bus_output/ -x 10xv2 -t 4 SRR8599150_S1_L001_R1_001.fastq.gz SRR8599150_S1_L001_R2_001.fastq.gz
ADD REPLYlink written 6 weeks ago by genomax74k

Thank you very much for your guide. Yes you are right, in their example, they used file names of SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz. They also mentioned that they used "mouse retinal cells SRR8599150 dataset from Koren et al., 2019". However, I downloaded these example dataset from ENA database (https://www.ebi.ac.uk/ena/data/view/PRJNA523252) and I got SRR8599150.fastq.gz and SRR8599151.fastq.gz files. I do not know why the name of their used files is different from the one that I got from ENA.

Also, in the ENA database, It does not mention R1 and R2 for the same Run accession number. Instead, it shows two files with two different accession numbers (SRR8599150 and SRR8599151), each with only one File 1 as FASTQ files (FTP) for download. While in the Kallisto | bus example, they are named as SRR8599150_S1_L001_R1_001.fastq.gz and SRR8599150_S1_L001_R2_001.fastq.gz which I can not see in ENA.

I will be grateful if you could guide and clarify me in this regard.

Best wishes, Farah

ADD REPLYlink written 6 weeks ago by F. Golestan20

Using the run selector it looks like there are two separate accession numbers listed for SRR8599150. If you use two GEO accession numbers then there are two separate records in ENA that show up. Record 1 and Record 2. One is marked as ctrl and other LD.

Lior Pachter (author of kallisto) participates on biostars so he may come across this thread and respond as to the mystery of mismatch between the example shown on kallisto site and data in SRA/ENA.

ADD REPLYlink written 6 weeks ago by genomax74k

Ok. Many thanks for your reply and guide. I also tagged Lior Pachter. Thanks.

ADD REPLYlink written 6 weeks ago by F. Golestan20

I am not able to dump the data from SRA (looks like this data may have become cloud only?) so your best bet is ENA. I hope NCBI is not jumping the gun here with the cloud stuff locking people out of datasets. There was supposed to be a formal announcement before the roll out. If that happened then I must have missed it.

ADD REPLYlink written 6 weeks ago by genomax74k

Oh Ok. Thank you for letting me know. I will use ENA for downloading the data.

ADD REPLYlink written 6 weeks ago by F. Golestan20

Tagging: Lior Pachter

ADD REPLYlink written 6 weeks ago by genomax74k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour