How to analyze the scRNA seq Fastq files from NCBI
0
0
Entering edit mode
6 months ago
aimanbarki ▴ 20

Hello Everyone, Is there any tutorial for Following: how to download Fastq file from NCBI how to check the file quality (How they needs to be?)> How to use cell ranger count on Fastq file? How to understand the output of the count?

I want to work with the healthy data set from the following website: BioProject_NCBI

I downloaded the fastq file using following command: • fastq-dump --split-files --gzip SRR10134390 I downloaded the reference from Gencode and make ref for cellranger count using following command

mkref --genome=GRCh38.p13 --fasta=GRCh38.primary_assembly.genome.fa --genes=gencode.v39.primary_assembly.annotation.gtf


I ran the cellranger count using the following command:

cellranger count --id=Healthy_aortic_valve2 --fastqs=/healthy1 --transcriptome=GRCh38.p13 --chemistry SC3Pv2


This commands run and created several folders but it does not seem right . because i can not find matrix files, and or BAM files.

Can someone tell me how I can find out the problem?

Thanks

SRAtool Cellranger NCBI Count • 1.1k views
1
Entering edit mode

Posting an error message or such is probably a good start. Beyond that, you'd probably get a lot out of the OSCA book in terms of understanding and performing scRNA-seq analysis.

1
Entering edit mode

are you looking in the correct path for the output files? If cellranger count ran successfully it should write the output to Healthy_aortic_valve2/outs according to documentation.

0
Entering edit mode

@ Jv The run created the . But it does not include the / outs direcotry. Now to find out the issue, from where I should start?

1
Entering edit mode

Input files for cellranger need to be in a specific format with the index sequences in separate files. You can find more information about that types and names of files here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/fastq-input

Simply splitting the SRA data may not give you the correct input files. Unfortunately these submitters appear to have not submitted original cellranger BAM file which would have allowed you to recreate the fastq files easily.

1
Entering edit mode

The index sequence doesn't have to be present anymore. It's just a legacy thing that cellranger's mkfastq makes it. (What does matter is that the fastqs be named exactly according to the Illumina standard)

1
Entering edit mode

Good to know. We generally demux using cellranger so have the files.

0
Entering edit mode

GenoMax and @swbarnes2 I changed the name of the files but i am attaching the pic how . I think the "+line" does not suppose to look like that or is it fine?

0
Entering edit mode

That should be fine. If you had used -F (original format option) when dumping the reads out they may look like normal illumina fastq headers (depending on how the submitters sent the data in). cellranger is supposed to only use 26 or 28 bp of read 1 based on chemistry.

Do you have an extra _ in the file names before S1? You should remove that.

1
Entering edit mode

That's how the folders are named when cellranger has yet to finish running properly.