Question

scRNA: subset a small number of fastq reads

0

Entering edit mode

3.4 years ago

bsmith030465 ▴ 240

Hi,

I was trying to experiment a bit with the scrna fastq files from 10x (v3x3). My files are of type:

L001_I1_001.fastq.gz
L001_R1_001.fastq.gz
L001_R1_001.fastq.gz

Since the full dataset is too large (~ 20,000 cells), is there a way (how) that I can extract fastq data for ~200 cells?

I was thinking of extracting the cell barcodes/whitelist using umi_tools and then just use grep for 200 of the barcodes. This would give me the results for R1, but how will I get the corresponding R2 reads?

Or is there a better way?

thanks for your help

scrna fastq cellranger 10x • 1.4k views

ADD COMMENT • link 3.4 years ago by bsmith030465 ▴ 240

0

Entering edit mode

Why do you want to limit yourself to analyzing only 1% of the data? 20k cells aren't that many and should be pretty easy to handle on most local workstations with any reasonable amount of RAM. How are you trying to analyze them and what issues are you having?

ADD REPLY • link 3.4 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thanks for your reply. I just wanted a bit of data to test my code and figured 200/300 cells would be quicker to work with than the full dataset. Would 32 GB be enough to map the full dataset in a reasonable amount of time?

ADD REPLY • link 3.4 years ago by bsmith030465 ▴ 240

0

Entering edit mode

Ah, sorry, I misunderstood to a degree. Mapping does take a while. If you have access to a cluster and can use the "cluster mode" for cellranger, it can map in ~4 hours for most samples, but it'll probably take >24 hours if not. I'd follow @genomax's suggestion if you're just looking to test the mapping before going for the full shebang.

ADD REPLY • link 3.4 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

This would give me the results for R1, but how will I get the corresponding R2 reads?

Once you have results of R1 reads that you are interested in you can easily fish out corresponding R2 reads using filterbyname.sh from BBMap suite using R1 read headers.