Fastq reads to csv - How can I make a csv with the read number of several fastq files in the same folder?
1
0
Entering edit mode
1 day ago
alx.alo • 0

How can I make a CSV file with the read number of several fastq files in the same folder?

I have received my fastq files from an illumina sequencing run, I need to make a CSV file listing the sample ID and read number (it´s a test run to balance our sample pool with 500+ samples). I´ve looked for ways to achieve this and I can´t figure out how to automate the fastq read count, I tried counting the reads and dividing by 4 as described in other posts, but I can´t make it work for all the fastq files, besides, I have a different fastq for each read (R1, R2) and I suppose I should count both as a single file (is that correct?).

I´d really apreciate any pointers to solving this issue

illumina csv read fastq count • 2.0k views
ADD COMMENT
2
Entering edit mode

I need to make a CSV file listing the sample ID and read number (it´s a test run to balance our sample pool with 500+ samples)

While you can easily get this information by using the answer below (or use seqkit stats: https://bioinf.shenwei.me/seqkit/usage/#stats )

ask your sequencing provider for this information. It is available in a csv file in the run reports.

I have a different fastq for each read (R1, R2) and I suppose I should count both as a single file

Yes the matching two paired-end reads come from one unique library fragment. You should count unique lubrary fragments. Sometimes you will see Illumina stats counting both reads to get a total number (double dipping).

ADD REPLY
3
Entering edit mode
1 day ago

assuming the files are gzipped and end with R1.fastq.gz:

find /path/to/dir  -type f -name "*.R1.fastq.gz" | while read F
do
     echo -n "${F}," && gunzip -c "${F}" | paste - - - - | wc -l
done

or just use FastQC.

ADD COMMENT

Login before adding your answer.

Traffic: 4291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6