Entering edit mode
5.1 years ago
miyagi
•
0
Dear all,
I am using FASTX-Toolkit 0.0.14-foss-2016a and using the command line:
$ module load FASTX-Toolkit; fastx_reverse_complement -z -i [input.fastq] -o [output.fastq.gz]
When I use:
$ wc -l input.fastq.gz
I get 6191703 returned vs. 4742402 for the length of the output.fastq.gz
Suggestions please?
If the output is really a fastq.gz you have to uncompress it to count the lines:
Same for input file of course.
hi thanks for the reply, the input file is a fastq, however i'm comparing the length to the original fastq.gz file because i know that the fastq file and the compressed file will have different lengths.
How do you do that?
Please use the
ADD REPLY
below a post you like to reply to.Thanks!
I just use $ wc -l input.fastq.gz and compare that to $ wc -l output.fastq.gz
regardless, the GB of the files goes from being 1.15 for the original file to 0.976 to the new file even though both are compressed
and i'm getting an error indicating the file is corrupted when using a downstream software analysis
Fastx
is very old and in low (or no) maintanance. Use seqtk and see if it performs better. Also, you cannot usewc -l
on gzipped files. You have to decompress first, likezcat file.fq.gz | wc -l
. Probably the differences you see are due to the improper use ofwc
on a compressed file.Fastx
, even though being old should be able to do a simple revcomp properly.Hey, thank you! haha .I know.. you'd think this very simple thing would work. The worst part is not even getting a proper error. I understand what you're saying about the wc -l, though that would not explain the difference in size of the files right (0.976 GB vs 1.15). I'll give seqtk a shot and update if it works.
You should never use file sizes as a metric for anything. Files compress differently when the individual nucleotides gets rearranged, which will certainly happen when you RC them.
Use
reformat.sh
from BBMap suite to do the reverse complementing. Something like: