Mapping FASTQ files of scRNA-seq to reference genome
0
0
Entering edit mode
3 months ago
Researcher ▴ 20

I am trying to analyze a publicly available data on SRA. For that I have to map FASTQ files to the reference genome. When I did this before, I downloaded the FASTQ files from Google Cloud and could map them to the reference genome using CellRanger. However, here my only option is to downloaded them from EBI or using SRA toolkit but then if I try CellRanger I will get a naming convention error. What other tools are available besides CellRanger to map the reads of single-cell RNA-seq to the reference genome using FASTQ files downloaded from EBI?

star EBI cellranger scRNA-seq SRA • 1.0k views
ADD COMMENT
1
Entering edit mode

alevin-fry and kallisto bustools are popular alternatives.

ADD REPLY
0
Entering edit mode

I second salmon and kallisto. They work directly with fastq files.

ADD REPLY
0
Entering edit mode

but then if I try CellRanger I will get a naming convention error

Rename the files. Or better, use softlinks. What's the problem with that?

ADD REPLY
0
Entering edit mode

I have tried doing this before, it did not work. I still got a header mismatch error.

ADD REPLY
0
Entering edit mode

Can you show me an example entry where you ran into this error?

ADD REPLY
1
Entering edit mode

Yes, here is the error:

Log message:

FASTQ header mismatch detected at line 4 of input files "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz" and "fastq/sample-Barcode/sample-Barcode_S4_L001_R2_001.fastq.gz": file: "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz", line: 4
ADD REPLY
0
Entering edit mode

Please share the GEO/EBI ID of these FASTQ files.

ADD REPLY
1
Entering edit mode

I am analyzing the scRNA-seq data of PRJNA657088, the SRA accession codes are: SRR12654354, SRR12654355, SRR12654356, SRR12654367, SRR12654378, SRR12654379. I tried to get them from AWS but I have to create a bucket and give permission which I did but I face an error on the "create a data delivery order" on the NCBI website about not giving permissions to the bucket.

ADD REPLY
1
Entering edit mode

After dumping a couple of reads for one of these I don't see any problems with mismatches.

$ head -4 SRR12654354*
==> SRR12654354_1.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NTTACATG
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#AAFFJJJ

==> SRR12654354_2.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NATGAAAAGAGTTGGCGGTTGCACTT
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#AAFFJJJJJJJJJJJJJJJJJJJJJ

==> SRR12654354_3.fastq <==
@K00162:326:HWJNNBBXX:8:1101:1103:1191
NGTGGGGAGCAGAGAATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCATCAATCTCTTGCACTCAAAGCTTGTTAAGATAGTTAAGCG
+K00162:326:HWJNNBBXX:8:1101:1103:1191
#<<<F<7FFJFJAJJJFJ7F-AAJF7-AJJJFAJJ7JA-7AJJ77F-J7-A<FJF-7-7FJFFJJJJF-FFFJJJJFFJ7FA-<AFFJJJF<JJJJJF
ADD REPLY
0
Entering edit mode

Please also share the exact CellRanger command you're using.

ADD REPLY
0
Entering edit mode

I still got a header mismatch error.

This and naming convention are two separate errors.

If you are getting a header mismatch then your reads are likely out of sync in R1/R2 files.

ADD REPLY
0
Entering edit mode

I am getting this error message: Log message: FASTQ header mismatch detected at line 4 of input files "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz" and "fastq/sample-Barcode/sample-Barcode_S4_L001_R2_001.fastq.gz": file: "fastq/sample-Barcode/sample-Barcode_S4_L001_R1_001.fastq.gz", line: 4 If it is because the reads are out of sync in R1/R2, how can I fix this?

ADD REPLY
0
Entering edit mode

You can use repair.sh from BBMap suite to bring the reads back in sync. Here is an example command line: How to resync paired-end data?

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6