Why does Cutadapt output much larger files than I am inputting?
0
0
Entering edit mode
11 months ago
sbm228 • 0

I am using usegalaxy.org to work with paired end RNAseq data. I am using Cutadapt to trim adapter sequences, and the Cutadapt output files are larger than the files I am inputting. Example, my first sample SRR6467550, the forward read input fastsanger.qz is 2.1 GB. After using Cutadapt, the output fastsanger.qz is 8.1 GB. This is causing my disk quota to fill much faster and making it difficult to work with the amount of data I have (226 samples, I am going to have to work in batches as is). Is this problem avoidable in any way? Is there a way to obtain an output that is smaller?
My full input for reference:

Paired-end collection: My Data

Read 1 (3'): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Read 2 (3'): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Minimum length (R1): 20

Quality cutoff: 20

Outputs Selector: Report: Cutadapt's per-adapter statistics. You can use this file with MultiQC.

Cutadapt Galaxy • 447 views
ADD COMMENT
0
Entering edit mode

Only way output files will be larger than inputs is if the input files were gzip compressed while the output files are not.

ADD REPLY
0
Entering edit mode

The Galaxy community has a dedicated help channel:

https://help.galaxyproject.org/

As GenoMax pointed out, maybe there is a "compressed output" option to select somewhere?

ADD REPLY

Login before adding your answer.

Traffic: 715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6