Stuck on how to run STARsolo on paired cDNA RNA-Seq FASTQ files
0
0
Entering edit mode
4 weeks ago
Nicholas • 0

Hey everyone,

I don't know how to approach this; I've been stuck on it for a few days. To verify sure everything is operating well, I'm currently trying to run STARsolo/SoloTE on a data set that the SoloTE publication provided. I know how to work with various data sets; for example, the project I'm working on now uses a data set of human placental cell single-cell RNA-Seq data, and it works flawlessly with STARsolo. But I am having trouble using the data set that the publication provides on a sample run like here. I've never worked on paired cDNA RNA-Seq sample runs before, so I'm not sure what the parameters are that this sample run need to run correctly. I'm prefetching the SRA and fasterq-dumping them into two different files. I'm not familiar with the sequences that look like this:

+SRR9713162.492268 492268 length=150
AAFFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ----<7AF-<F-7<FFF--<F-----7A-AFFJFF<7-7---)7)7-7--7-<-7AA----AF7-7A7FF<FFFJFFAF<F----)))))7)7--<F--
@SRR9713162.492269 492269 length=150
TCAGCAACAGGACGTATTTCTTAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCATTCTAAGACTTTAAGTTCTCTGGCATGAGTTTATCTGCAATCATAAACTAAAAAATAACCCAAACACACCCCACCAAACCCAACCGTAC
+SRR9713162.492269 492269 length=150
-AFFFJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ--7<7-7------F7---<<-7---)-)-7---7--<-7-7-7--7-<<7----7------7----7----7)--)--)7---<--)7----
@SRR9713162.492270 492270 length=150
ACCTTTAAGGCTCTTAACCATATCCGTTTTTTTTTTCTTTTTTTTTTTTTTTTTTTCCTAGAGGAAAACCCGGTAATGATGTCGGGGTTGAGGGATAGGAGGAGGATGGGGGATAGGTGTATGAACATGAGGGTGTTTTCTCGTGTGAAT
+SRR9713162.492270 492270 length=150

How might I apply STARsolo to two lanes of cDNA files as described in the paper? Tell me if anyone can help, please. Thank you so much.

STARsolo FASTQ • 411 views
ADD COMMENT
0
Entering edit mode

Looks like this experiment is using Chromium Single Cell 3' v2 Reagent Kits. As you can see from the sequence you posted, read 1 contains 26 bp usable (UMI + cell barcodes) which are followed by ploy-T stretch. Rest of read 1 useless (is not going to be used by cellranger and perhaps STARsolo) even though it is sequenced to 150 cycles. So read 2 is going to be your RNA read.

ADD REPLY
0
Entering edit mode

Hmm, I wouldn't necessarily call it useless. ;) The stuff after the polyT stretch often contains biological sequences that you can do paired-end mapping with your read 2. I'd just call it 'unnecessary' (for most purposes).

ADD REPLY
0
Entering edit mode

Fair enough. I amended my comment.

ADD REPLY
0
Entering edit mode

Would it be more appropriate to use cellranger count as an alternative to STARsolo for quantifying gene expression in the paired-end single-cell cDNA RNA-Seq data? I'm just not sure if STARsolo could be ran on two paired cDNA files. I used cellranger, and it seemed to run perfectly fine without exceptions.

ADD REPLY
0
Entering edit mode

STARsolo will work just fine on your data. You'll probably need to set --soloBarcodeReadLength 0 so that STARsolo doesn't get confused that your R1 is 150 bp.

ADD REPLY
0
Entering edit mode

I understand. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6