10X scRNA-seq v2 odd fastq format
1
0
Entering edit mode
7 months ago
Dave Carlson ★ 1.7k

Hi Biostars,

I've downloaded some scRNA-Seq data from the GSA, which I am hoping to analyze. This is 10x V2 chemistry sequence data.

However the format is different from what I am familiar with. First, the reads come in 2 fastq files ("f1" and "r2"):

CRR034505_f1.fastq.gz
CRR034505_r2.fastq.gz

More importantly, both mates have equal read lengths:

zcat CRR034505_f1.fastq.gz | head -n4
@ST-E00126:655:HL5FTCCXY:5:1101:7638:1151 1:N:0:NAAGTGCT
NAGTAACCAAGACACGTATTGCGCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAACTAAAAAGGGGTCCCAGAATTTCAGCAGTTCTCTGATTTTTATATTTTATTCCTCTTCCTATCCAATCCCTGCCTTTTGCTTCAAGGTG
+
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAA--7<<--AFFJ7-)-7)--<AAF7<<--7-----77---AA7AFJF7F<A----7A-<-AA<F<A7-A-77)--<F-<<--7A--<-77
zcat CRR034505_r2.fastq.gz | head -n4
@ST-E00126:655:HL5FTCCXY:5:1101:7638:1151 2:N:0:NAAGTGCT
NCAAAGAAAAAGACACATTTGGGAAGAAAAGCAGGAAAAACGTTAAAGAAAATGTACTTACCACCTGGACTCAAAAGGCAGGGATTGGATAGGAAGAGGAATAAAATATAAAAATCAGAGAACTGCTGAAATTATGTGACCACTTTTTAG
+
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJAJJJJFJJFAJFJJJJJJJJJJJJJJJJJJF-A7JAAF---7<<AFJFF<AJ--F-)-7<<FJAFA

I have never seen a scRNA-Seq data set that looks like this, though I have not done much work with 10x V2 chemistry in the past.

Is this normal? Has anybody encountered scRNA-Seq data like this before?

Thanks!

Dave

10x scRNA-Seq • 579 views
ADD COMMENT
2
Entering edit mode
7 months ago
GenoMax 141k

Some people sequence 10x libraries for equal length R1/R2 ignoring 10x sequencing recommendations. This is probably done for convenience of the sequencing centers. You can either choose the first 26 bp from read 1 (as you can see it reads) since at that point cell barcode + UMI is followed by the poly-dT linker. If you are planning to use cellranger it should take the bases it needs from R1 file.

*NAGTAACCAAGACACGTATTGCGCAT*TTTTTTTTTTTTTTTTTTTTTT
ADD COMMENT
0
Entering edit mode

Fantastic, thank you!

ADD REPLY
0
Entering edit mode

just make sure you compare the R1 reads after deleting the other bases to the 10x whitelist to make sure those are cell barcodes. R1 is the 16bp feature barcode + 10 bp UMI https://divingintogeneticsandgenomics.com/post/understand-10x-scrnaseq-and-scatac-fastqs/ v3 is 12bp UMI if I recall.

ADD REPLY

Login before adding your answer.

Traffic: 1729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6