Question

Problem with read counts after concatenating FASTQ files

0

Entering edit mode

4.5 years ago

Akshaya • 0

Hello ! I downloaded multiple run accession files (RNAseq-FASTQ) from ENA and tried to concatenate them using cat.

cat *_1.fastq.gz > merged_R1.fastq.gz

The output file is bigger in size, however, when I look at the read counts it is the same as the first of all the files that were concatenated. I am a beginner in NGS data analysis. Can someone please advise what am I doing wrong/missing.

Thank you

RNA-Seq Fastq • 1.0k views

ADD COMMENT • link 4.5 years ago by Akshaya • 0

0

Entering edit mode

What is ${i}? Please show the entire code including how you determined the read number.

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

I used the readlength.sh script in bbmap

ADD REPLY • link 4.5 years ago by Akshaya • 0

0

Entering edit mode

What is ${i}? Please show the entire code

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

for i in $(ls -d */)
do
    cd $i
    cat *_1.fastq.gz > ${i%?}_R1.fastq.gz
    cat *_2.fastq.gz > ${i%?}_R2.fastq.gz
    cd ..
done

I had the files for different samples in individual folders.

ADD REPLY • link 4.5 years ago by Akshaya • 0

0

Entering edit mode

Looks fine, did checking read counts with the suggestion from RamRS help? I guess everything is fine and simply the counting went wrong.

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

Yes, RamRS's suggestion did help. Thank you

ADD REPLY • link 4.5 years ago by Akshaya • 0

score 2 · Accepted Answer · 2019-11-29

2

Entering edit mode

4.5 years ago

Ram 43k

It could be a problem with gzip streaming library. Can you try a simple zcat merged_R1.fastq | wc -l? That should give you 4 * number_of_reads (=number of lines)

ADD COMMENT • link 4.5 years ago by Ram 43k

0

Entering edit mode

Yes, this works. Thank you so much!

ADD REPLY • link 4.5 years ago by Akshaya • 0

0

Entering edit mode

enter image description here

ADD REPLY • link 4.5 years ago by ATpoint 82k