fastx_reverse_complement truncating files?
0
0
Entering edit mode
5.2 years ago
miyagi • 0

Dear all,

I am using FASTX-Toolkit 0.0.14-foss-2016a and using the command line:

$ module load FASTX-Toolkit; fastx_reverse_complement -z -i [input.fastq] -o [output.fastq.gz]

When I use:

$ wc -l  input.fastq.gz

I get 6191703 returned vs. 4742402 for the length of the output.fastq.gz

Suggestions please?

fastx reverse compliment RNA-Seq • 1.7k views
ADD COMMENT
1
Entering edit mode

If the output is really a fastq.gz you have to uncompress it to count the lines:

$ zcat input.fastq.gz | wc -l

Same for input file of course.

ADD REPLY
0
Entering edit mode

hi thanks for the reply, the input file is a fastq, however i'm comparing the length to the original fastq.gz file because i know that the fastq file and the compressed file will have different lengths.

ADD REPLY
0
Entering edit mode

however i'm comparing the length to the original fastq.gz

How do you do that?

Please use the ADD REPLY below a post you like to reply to.

Thanks!

ADD REPLY
0
Entering edit mode

I just use $ wc -l input.fastq.gz and compare that to $ wc -l output.fastq.gz

ADD REPLY
0
Entering edit mode

regardless, the GB of the files goes from being 1.15 for the original file to 0.976 to the new file even though both are compressed

ADD REPLY
0
Entering edit mode

and i'm getting an error indicating the file is corrupted when using a downstream software analysis

ADD REPLY
0
Entering edit mode

Fastx is very old and in low (or no) maintanance. Use seqtk and see if it performs better. Also, you cannot use wc -l on gzipped files. You have to decompress first, like zcat file.fq.gz | wc -l. Probably the differences you see are due to the improper use of wc on a compressed file. Fastx, even though being old should be able to do a simple revcomp properly.

ADD REPLY
0
Entering edit mode

Hey, thank you! haha .I know.. you'd think this very simple thing would work. The worst part is not even getting a proper error. I understand what you're saying about the wc -l, though that would not explain the difference in size of the files right (0.976 GB vs 1.15). I'll give seqtk a shot and update if it works.

ADD REPLY
2
Entering edit mode

You should never use file sizes as a metric for anything. Files compress differently when the individual nucleotides gets rearranged, which will certainly happen when you RC them.

Use reformat.sh from BBMap suite to do the reverse complementing. Something like:

reformat.sh in=your_file.fq.gz out=rcomp_fq.gz rcomp=t

rcomp=f                 (rc) Reverse-compliment reads.
rcompmate=f             (rcm) Reverse-compliment read 2 only.
ADD REPLY

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6