Question: fastx_reverse_complement truncating files?
0
gravatar for miyagi
8 months ago by
miyagi0
miyagi0 wrote:

Dear all,

I am using FASTX-Toolkit 0.0.14-foss-2016a and using the command line:

$ module load FASTX-Toolkit; fastx_reverse_complement -z -i [input.fastq] -o [output.fastq.gz]

When I use:

$ wc -l  input.fastq.gz

I get 6191703 returned vs. 4742402 for the length of the output.fastq.gz

Suggestions please?

ADD COMMENTlink modified 8 months ago by ATpoint24k • written 8 months ago by miyagi0
1

If the output is really a fastq.gz you have to uncompress it to count the lines:

$ zcat input.fastq.gz | wc -l

Same for input file of course.

ADD REPLYlink written 8 months ago by finswimmer12k

hi thanks for the reply, the input file is a fastq, however i'm comparing the length to the original fastq.gz file because i know that the fastq file and the compressed file will have different lengths.

ADD REPLYlink written 8 months ago by miyagi0

however i'm comparing the length to the original fastq.gz

How do you do that?

Please use the ADD REPLY below a post you like to reply to.

Thanks!

ADD REPLYlink written 8 months ago by finswimmer12k

I just use $ wc -l input.fastq.gz and compare that to $ wc -l output.fastq.gz

ADD REPLYlink written 8 months ago by miyagi0

regardless, the GB of the files goes from being 1.15 for the original file to 0.976 to the new file even though both are compressed

ADD REPLYlink written 8 months ago by miyagi0

and i'm getting an error indicating the file is corrupted when using a downstream software analysis

ADD REPLYlink written 8 months ago by miyagi0

Fastx is very old and in low (or no) maintanance. Use seqtk and see if it performs better. Also, you cannot use wc -l on gzipped files. You have to decompress first, like zcat file.fq.gz | wc -l. Probably the differences you see are due to the improper use of wc on a compressed file. Fastx, even though being old should be able to do a simple revcomp properly.

ADD REPLYlink modified 8 months ago • written 8 months ago by ATpoint24k

Hey, thank you! haha .I know.. you'd think this very simple thing would work. The worst part is not even getting a proper error. I understand what you're saying about the wc -l, though that would not explain the difference in size of the files right (0.976 GB vs 1.15). I'll give seqtk a shot and update if it works.

ADD REPLYlink written 8 months ago by miyagi0
2

You should never use file sizes as a metric for anything. Files compress differently when the individual nucleotides gets rearranged, which will certainly happen when you RC them.

Use reformat.sh from BBMap suite to do the reverse complementing. Something like:

reformat.sh in=your_file.fq.gz out=rcomp_fq.gz rcomp=t

rcomp=f                 (rc) Reverse-compliment reads.
rcompmate=f             (rcm) Reverse-compliment read 2 only.
ADD REPLYlink modified 8 months ago • written 8 months ago by genomax73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2341 users visited in the last hour