Question: Very large fastq files from RNA-seq
0
gravatar for ramyak1912
8 weeks ago by
ramyak19120
ramyak19120 wrote:

Hi all, I am trying to optimize an RNA-seq pipeline and I want to be able to estimate the RAM requirements for fastq files of different sizes. So far I have tested on files from typical rna-seq expeirments of ~30 to 40 million reads. I want to now test on much larger data where the file is close to 50gb in size .

I was wondering where I can obtain such files for testing. Can anyone point me to some publicly available datasets that have more number of sequences than what I have already done? Anything like >= 150 million reads would also be okay.

Thanks , RK

sequencing rna-seq • 208 views
ADD COMMENTlink modified 8 weeks ago by weresejuriya0 • written 8 weeks ago by ramyak19120
3

You can try to merge multiple samples to big fastq file. Use cat or zcat command.

ADD REPLYlink written 8 weeks ago by MatthewP850

Hi , Thanks for your reply. I used some samples from ENCODE which were ~30GB or ~200 million reads to run some jobs on aws batch. And I got Out OfMemory Errors. And I am aligning to the human genome. I used the same pipeline before for running a basic rna-seq experiment with 25 to 30 million reads and I didn't have any problems then.

Do you have any idea about what could be the problem?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by ramyak19120

WIthout details on the pipeline this is impossible to answer. Please add comments via ADD COMMENT/REPLY to keep things organized.

ADD REPLYlink written 8 weeks ago by ATpoint44k
3
gravatar for i.sudbery
8 weeks ago by
i.sudbery10k
Sheffield, UK
i.sudbery10k wrote:

ENCODE has some samples with more than 100m reads in them. E.g. ENCSR000COU. When you talk about fastq size are you talking about compressed or uncompressed? Because compressed, even this 100m read sample is only ~ 8Gb.

BTW, with most of the standard pipelines (STAR -> featureCounts -> DEseq2 or salmon -> tximport -> DESeq2 or kalisto -> tximport -> DESeq2) memory usage scales with the size of the genome, and the number of reads doesn't make a difference.

ADD COMMENTlink written 8 weeks ago by i.sudbery10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour
_