Proper fastq pre-processing for extraction of count matrix
2
0
Entering edit mode
3.8 years ago

I downloaded some fastq files from SRA for analysis. I aligned them to my genome of interest using STAR, but now I'm not sure if my process was correct. My goal is to get a count matrix and analyze the data in Seurat. I used STAR's quantMode to get a count matrix, but the output only has 3 columns, i.e. three cells. This is obviously not the desired outcome. Now I'm wondering if I needed to split the fastqs into individual files for each cell/sample, or if the cell information is retained in the alignment bam file and I can use featureCount or htseq or something else to create a correct count matrix. Edit: This is Fluidigm C1 data

RNA-Seq sequence genome next-gen • 2.3k views
ADD COMMENT
0
Entering edit mode

What single-cell protocol does it use?

ADD REPLY
0
Entering edit mode

They used Fluidigm C1

ADD REPLY
1
Entering edit mode

Hm, I'm unfamiliar with the output of that protocol. You may want to find some papers that utilize it and see how they generate their counts.

ADD REPLY
1
Entering edit mode

You may need to use one of these scripts to do the analysis of Fluidigm data.

ADD REPLY
0
Entering edit mode

I think you're right. There is a "C1 mRNA Seq HT Demultiplex Script" available here https://www.fluidigm.com/software that I believe will do what I need

ADD REPLY
0
Entering edit mode

david.f.stein : Please edit your original post and add in bold letters that this is Fluidigm data. It will have to be reprocessed properly before you can start doing counts.

ADD REPLY
1
Entering edit mode
3.8 years ago
newbio17 ▴ 360

Assuming the data is scRNA-Seq, you might want to check out one of previous workshops by Broad Institute (2019). It covers most of the basics for processing scRNA-Seq data including some downstream analyses you might be interested in.

ADD COMMENT
0
Entering edit mode
3.8 years ago

You can't just throw a single cell fastq into STAR and expect it to figure out what all the cells are. It wasn't designed for that. It thinks you have bulk RNAseq data. The three columns are the counts depending on whether your protocol is stranded, stranded and running forward, stranded and running reverse

ADD COMMENT
0
Entering edit mode

Thanks for your feedback, as I think I intimated in the question, I suspected I was doing something wrong. Do you have any suggestion on how to do this properly?

ADD REPLY
1
Entering edit mode

If you use STAR then use its module STARsolo ( https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md ) which is intended for single-cell work, but I cannot any further since I did not work with Fluidigm data so far. Don't the authors provide a count matrix at GEO? This is typically the case.

ADD REPLY
0
Entering edit mode

I didn't know count matrices were usually provided on GEO. This is the GEO accession https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE106269. Is "Series Matrix File(s)" the count matrix?

ADD REPLY
1
Entering edit mode

No, it should be any of the files when you click Custom next to GSE106269_RAW.tar at the very bottom. This section typically contains uploaded data such as count matrices. Check the paper whether they tell what exactly these uploaded files are.

ADD REPLY

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6