Problem with read counts after concatenating FASTQ files
1
0
Entering edit mode
4.5 years ago
Akshaya • 0

Hello ! I downloaded multiple run accession files (RNAseq-FASTQ) from ENA and tried to concatenate them using cat.

cat *_1.fastq.gz > merged_R1.fastq.gz

The output file is bigger in size, however, when I look at the read counts it is the same as the first of all the files that were concatenated. I am a beginner in NGS data analysis. Can someone please advise what am I doing wrong/missing.

Thank you

RNA-Seq Fastq • 1.0k views
ADD COMMENT
0
Entering edit mode

What is ${i}? Please show the entire code including how you determined the read number.

ADD REPLY
0
Entering edit mode

I used the readlength.sh script in bbmap

ADD REPLY
0
Entering edit mode

What is ${i}? Please show the entire code

ADD REPLY
0
Entering edit mode
for i in $(ls -d */)
do
    cd $i
    cat *_1.fastq.gz > ${i%?}_R1.fastq.gz
    cat *_2.fastq.gz > ${i%?}_R2.fastq.gz
    cd ..
done

I had the files for different samples in individual folders.

ADD REPLY
0
Entering edit mode

Looks fine, did checking read counts with the suggestion from RamRS help? I guess everything is fine and simply the counting went wrong.

ADD REPLY
0
Entering edit mode

Yes, RamRS's suggestion did help. Thank you

ADD REPLY
2
Entering edit mode
4.5 years ago
Ram 43k

It could be a problem with gzip streaming library. Can you try a simple zcat merged_R1.fastq | wc -l? That should give you 4 * number_of_reads (=number of lines)

ADD COMMENT
0
Entering edit mode

Yes, this works. Thank you so much!

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1357 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6