splitting fastq files output size issue
0
0
Entering edit mode
7.5 years ago

I have a 22 Gb file that has paired end reads merged. In other words, in a single file, I have all R1 and R2s. After splitting the file to get 2 files (to input to Trinity), one in which I have all R1s and another one will all R2s, why are my single files much smaller than the original file? R1 file is 1.22 gb and R2 file is also 1.22 gb. Thanks!

fastq trinity paired end • 2.1k views
ADD COMMENT
0
Entering edit mode

How did you split the file?

ADD REPLY
0
Entering edit mode

using:

paste - - - - < test.fq \ | tee >(awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 1:N")) print $1,$2,$3,$4}' > test.r1.fq ) \ | awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 2:N")) print $1,$2,$3,$4}' > test.r2.fq

ADD REPLY
0
Entering edit mode

I think that's unnecessarily complicated. Have you tried some of the methods from Fastq Splitter For Paired End Reads

ADD REPLY
1
Entering edit mode

I think that's unnecessarily complicated

yup

  rm -f R1.fq R2.fq && cat R0.fq  | awk '(NR%4==1) { out = (index($0,"1:N")?"R1.fq":"R2.fq");} { print >> out;}'

are you sure ALL your reads contain 1:N or 2:N ?

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6