Why does Cutadapt output much larger files than I am inputting?

0

Entering edit mode

2.6 years ago

sbm228 • 0

I am using usegalaxy.org to work with paired end RNAseq data. I am using Cutadapt to trim adapter sequences, and the Cutadapt output files are larger than the files I am inputting. Example, my first sample SRR6467550, the forward read input fastsanger.qz is 2.1 GB. After using Cutadapt, the output fastsanger.qz is 8.1 GB. This is causing my disk quota to fill much faster and making it difficult to work with the amount of data I have (226 samples, I am going to have to work in batches as is). Is this problem avoidable in any way? Is there a way to obtain an output that is smaller?
My full input for reference:

Paired-end collection: My Data

Read 1 (3'): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Read 2 (3'): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Minimum length (R1): 20

Quality cutoff: 20

Outputs Selector: Report: Cutadapt's per-adapter statistics. You can use this file with MultiQC.

Cutadapt Galaxy • 928 views

ADD COMMENT • link updated 2.6 years ago by GenoMax 141k • written 2.6 years ago by sbm228 • 0

0

Entering edit mode

Only way output files will be larger than inputs is if the input files were gzip compressed while the output files are not.

ADD REPLY • link 2.6 years ago by GenoMax 141k

0

Entering edit mode

The Galaxy community has a dedicated help channel:

https://help.galaxyproject.org/

As GenoMax pointed out, maybe there is a "compressed output" option to select somewhere?

ADD REPLY • link 2.6 years ago by h.mon 35k

Login before adding your answer.