Question: splitting fastq files output size issue
0
gravatar for adrilu.romero
3.1 years ago by
adrilu.romero0 wrote:

I have a 22 Gb file that has paired end reads merged. In other words, in a single file, I have all R1 and R2s. After splitting the file to get 2 files (to input to Trinity), one in which I have all R1s and another one will all R2s, why are my single files much smaller than the original file? R1 file is 1.22 gb and R2 file is also 1.22 gb. Thanks!

paired end fastq trinity • 1.2k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by adrilu.romero0

How did you split the file?

ADD REPLYlink written 3.1 years ago by Eric Lim1.6k

using:

paste - - - - < test.fq \ | tee >(awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 1:N")) print $1,$2,$3,$4}' > test.r1.fq ) \ | awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 2:N")) print $1,$2,$3,$4}' > test.r2.fq

ADD REPLYlink written 3.1 years ago by adrilu.romero0

I think that's unnecessarily complicated. Have you tried some of the methods from Fastq Splitter For Paired End Reads

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Eric Lim1.6k
1

I think that's unnecessarily complicated

yup

  rm -f R1.fq R2.fq && cat R0.fq  | awk '(NR%4==1) { out = (index($0,"1:N")?"R1.fq":"R2.fq");} { print >> out;}'

are you sure ALL your reads contain 1:N or 2:N ?

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Pierre Lindenbaum124k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 652 users visited in the last hour