I am new to data analysis, so please bear that in mind.
I can load 10x data into R, and can follow seurat tutorial.
But now I need to work with Smart seq3 data, which is in "RawFASTQ" folder.
(I guess this is the list of rawFASTQ file from the experiment.)
How can I load this file into R so that I can do the thing that I did with 10x data?
Do I need a special program? I tried to find a tutorial or something, but failed..
Sorry for this baby stepping question, but right now, I am so stuck..
Please help.
Oh yes, you only have the raw reads so that means you have to map the raw reads to a reference and obtain a cell-by-gene count matrix that can be loaded into seurat. (Typically this step is already done by cellranger but that's only for 10x data).
Thank you so much for your reply.
As I mentioned earlier, I am a newborn in analysis, so... my question will be dumb,
but... here we go.
It took some time to understand what was going on on the link that you shared, and it seems like I need to do it on conda environment.
Is there any other way that I can do all things on R?
Do I need to learn python in order to be able to handle Smart-Seq3 data?
I thought once I can import the data, then the rest would be the same as analyzing 10x data..Am I wrong?
It would be really helpful if I only need to deal with R, but if python is the only way, that's fine. I will learn it.. (sigh..)
You need to learn how to use a command-line terminal.
First run:
pip install kb_python
Then run the "kb ref" command and the "kb count" command detailed in that previous link.
Sorry, but bioinformatics analyses is something that you "learn", not just "do". We were all newborns at one point -- but it's a learning process (in fact, I'm still a newborn in many, many areas and am still a learner). Going from FASTQ files to something you can load in Seurat is not easy, and requires effort and learning. Doing bioinformatics correctly and robustly is a full-time job for many researchers.
What you have been given are FASTQ files -- those are the A's, T's, C's, and G's that a sequencing machine spits out. You're going to have over a trillion of A's/T's/C's/G's.
What Seurat does is takes in files like "In this single cell, gene A has 10 counts; gene B has 20 counts; in another cell, gene B has 5 counts and gene C has 30 counts".
Going from the FASTQ files that your sequencing machine spits out to what Seurat takes in requires an additional step in-between. That step is what is detailed in the link I supplied.
Thank you so much for your reply. As I mentioned earlier, I am a newborn in analysis, so... my question will be dumb, but... here we go.
It took some time to understand what was going on on the link that you shared, and it seems like I need to do it on conda environment. Is there any other way that I can do all things on R? Do I need to learn python in order to be able to handle Smart-Seq3 data? I thought once I can import the data, then the rest would be the same as analyzing 10x data..Am I wrong?
It would be really helpful if I only need to deal with R, but if python is the only way, that's fine. I will learn it.. (sigh..)
Thank you!
You need to learn how to use a command-line terminal.
First run:
pip install kb_python
Then run the "kb ref" command and the "kb count" command detailed in that previous link.
Sorry, but bioinformatics analyses is something that you "learn", not just "do". We were all newborns at one point -- but it's a learning process (in fact, I'm still a newborn in many, many areas and am still a learner). Going from FASTQ files to something you can load in Seurat is not easy, and requires effort and learning. Doing bioinformatics correctly and robustly is a full-time job for many researchers.
What you have been given are FASTQ files -- those are the A's, T's, C's, and G's that a sequencing machine spits out. You're going to have over a trillion of A's/T's/C's/G's. What Seurat does is takes in files like "In this single cell, gene A has 10 counts; gene B has 20 counts; in another cell, gene B has 5 counts and gene C has 30 counts". Going from the FASTQ files that your sequencing machine spits out to what Seurat takes in requires an additional step in-between. That step is what is detailed in the link I supplied.