Cannot distinguish R1 or R2
1
0
Entering edit mode
10 months ago
Long • 0

Hello everyone,

I am downloading FASTQ files from SRA Explorer for the dataset GSE182256. After downloading the FASTQ file, I encountered a problem: I cannot distinguish between the R1, R2, or I1 files in the downloaded FASTQ file. I am wondering how to deal with this situation.

The file names are:

SRR15500524_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500525_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500526_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500527_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500528_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500529_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500530_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz
SRR15500531_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz

Thanks,

fastq • 874 views
ADD COMMENT
0
Entering edit mode

something the side is in the read ,names.

show us

gunzip -c SRR15500524_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz | head -n 8

ADD REPLY
0
Entering edit mode
gunzip -c SRR15500524_GSM5525832_Control2_scRNAseq_Mus_musculus_RNA-Seq.fastq.gz | head -n 8
@SRR15500524.1 NS500497:57:H27CKBGX2:1:22206:11449:8088/3
GCTGAGTAGTACTCCATTGTGTAGATCTACCACATTTTCTGTATCCATTCCTCTGTTGAGGGGCATCTGGGTTCCTTCCAGCTTCTGGCTATTATAAA
+
AAAAA6EEEEEEEEEEEEEEEE6EAA/EEEE<EA/EEEEEAEEAEAA/AEEEEE//</AE/EEEEAEEAEA/EEE/EEEA<AA/<6//AE/E</E/A<
@SRR15500524.2 NS500497:57:H27CKBGX2:1:21202:17645:20267/3
GCTATGCCTTTCATCTGGGATTAAAGGTGTGGTGGAACACACCTTTAATCTGGGCTACACCTTTTGCTGGAGACAATATAAGAACATTGGAAGAAGGG
+
AAA6/EEEEAEEEEEEEAEAEE/E/AEAEE6EEEEE//<EEE/AA/AEE<E<A/EEAEEE//AEE///A/A//</<E/<EEA/6/<AE/E6A//////
ADD REPLY
1
Entering edit mode
10 months ago
GenoMax 148k

Using fastq-dump --split-files I can see the three file types. How did you download your files? There should be three files per sample.

$ head -4 SRR15500524*.fastq
==> SRR15500524_1.fastq <==
@NS500497:57:H27CKBGX2:1:22206:11449:8088
GAAGGAAC
+NS500497:57:H27CKBGX2:1:22206:11449:8088
AAAAAEEE

==> SRR15500524_2.fastq <==
@NS500497:57:H27CKBGX2:1:22206:11449:8088
TTGACTTGTTAAGGGCTGTACCTTGT
+NS500497:57:H27CKBGX2:1:22206:11449:8088
AAAA//EEEEEAEEEAEEEEE/AEEE

==> SRR15500524_3.fastq <==
@NS500497:57:H27CKBGX2:1:22206:11449:8088
GCTGAGTAGTACTCCATTGTGTAGATCTACCACATTTTCTGTATCCATTCCTCTGTTGAGGGGCATCTGGGTTCCTTCCAGCTTCTGGCTATTATAAA
+NS500497:57:H27CKBGX2:1:22206:11449:8088
AAAAA6EEEEEEEEEEEEEEEE6EAA/EEEE<EA/EEEEEAEEAEAA/AEEEEE//</AE/EEEEAEEAEA/EEE/EEEA<AA/<6//AE/E</E/A<
ADD COMMENT
0
Entering edit mode

What I do using SRA explore website, I enter the SRR15500524 into searching bar, and then download the fastq file by using curl.

ADD REPLY
0
Entering edit mode

Unfortunately with single cell data things tend to be all over the place in SRA. You may have to use fastq-dump if sra-explorer is giving you a single link. You basically have just the RNA read (file 3).

ADD REPLY
0
Entering edit mode

Yes, i think So. Since GSE web said there are 3 file. But based on this how to determine R1, R2 and I1? Thanks for help.

ADD REPLY
0
Entering edit mode

Notice the /3 in your example above

@SRR15500524.2 NS500497:57:H27CKBGX2:1:21202:17645:20267/3

that says it is the third piece. Perhaps /1 and /2 are in the one file you have.

ADD REPLY
0
Entering edit mode

Thanks for your replies. I use prefetch downalad the SRA file, and after it I use fastq-dump to split .sra file into 3. However, again I still have trouble to distinguish which one is R1, R2 or I1. The name of file looks like:

SRR15500516_1.fastq SRR15500516_2.fastq SRR15500516_3.fastq.

ADD REPLY
1
Entering edit mode

_1 = I1 = Illumina index (not needed for analysis)
_2 = R1 = Cell barcode + UMI
_3 = R2 = RNA read.

If you are planning to use cellranger you will need to rename the files per its requirements (see How to rename fastqs for cell ranger ? ).

ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6