Question: scRNA-seq: Kallisto processing of bioproject data (fastq-dump)
0
gravatar for bsmith030465
5 weeks ago by
bsmith030465150
United States
bsmith030465150 wrote:

Hi,

I was trying to get started with scRNA seq analysis. I downloaded a test dataset from bioproject (NCBI). However, each sample is in three fastq files (e.g. SRR123_1.fastq.gz, SRR123_2.fastq.gz, SRR123_3.fastq.gz).

How do I process these in Kallisto? Do I need to combine all of these files (how?), or split each file into forward and reverse reads before combining?

Else, what fastq-dump command do I need to issue to download the forward and reverse reads as separate files? My current command is:

fastq-dump -I --split-files SRR123

thanks!

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by bsmith030465150

could you post the command of :

gzip -dc SRR123_1.fastq.gz | head -n 4
gzip -dc SRR123_2.fastq.gz | head -n 4
gzip -dc SRR123_3.fastq.gz | head -n 4
ADD REPLYlink written 5 weeks ago by Nicolas Rosewick8.3k
gzip -dc SRR123_1.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
ACGAGCCANTGTACCTGTGATGGAAC
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAEEE#EEEEEEEEEEEEEEEEE

gzip -dc SRR123_2.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
TATCTAAAATNAANGTNGTNAAAAGTTATNTNNCTGTGTTNTTACNNTNNTTAANANTGTNNNATTNNNNTCCNNCANTNNTNANNNNTNNNNNNNAT
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAAAEEE#6E#EE#EE#EEEEEEEEE#E##EEEEEEE#EEEA##E##EEAE#6#EEE###/EE####EE<##/A#/##/#<####/#######/<

gzip -dc SRR123_3.fastq.gz | head -n 4
@NB501328:163:HK2GVBGX5:2:11101:14937:1054
GAAACCCT
+NB501328:163:HK2GVBGX5:2:11101:14937:1054
AAAAAEEE
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by bsmith030465150

ok so I guess that the SRR123_3.fastq.gz is the sequencing barcode ( to multiplex multiple samples). Could you maybe post the link to the bioproject please ?

ADD REPLYlink written 4 weeks ago by Nicolas Rosewick8.3k

I got the fastq files by executing: fastq-dump --split-files --gzip SRR8611970

ADD REPLYlink written 4 weeks ago by bsmith030465150
2

Looking at SRA webpage : https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8611970 it seems that _1 = R1 ; _2 = R2 and _3 = sample index.

check here for more details : https://bioinformatics.stackexchange.com/questions/5178/what-is-the-index-fastq-file-sample-i-fastq-gz-generated-when-demultiplexing

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Nicolas Rosewick8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2425 users visited in the last hour