Question: Question: Mean and SD read length from a range of fastq files
1
gravatar for eoin
11 months ago by
eoin10
eoin10 wrote:

Hi all,

I'm trying to write some code to generate mean read length data from a range of fastq files. awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' HG1.fastq > readLength.txt

i've got as far as here from looking through other posts and trying to improve but i'm stuck on a couple of things. This command only works on a single file and will report the length of each read within that file separately.

I want to run a single command so the mean and Standard Dev of read lengths from all .fastq files within a folder are reported in a single .txt file, one sample per line. I gues SD might be difficult to calculate in a command so even just the mean read length.

e.g.the first 5 files in my folder are: ru1.fastq ru2.fastq hg3.fastq hg25.fastq ru7.fastq

obviously i'm a bit of a novice at this so all help would be appreciated !!

thanks a lot

sequencing sed awk fastq • 617 views
ADD COMMENTlink modified 3 months ago by Biostar ♦♦ 20 • written 11 months ago by eoin10
3
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum104k wrote:

Using awk:

awk 'BEGIN { t=0.0;sq=0.0; n=0;} ;NR%4==2 {n++;L=length($0);t+=L;sq+=L*L;}END{m=t/n;printf("total %d avg=%f stddev=%f\n",n,m,sq/n-m*m);}'  *.fastq
ADD COMMENTlink written 11 months ago by Pierre Lindenbaum104k

thank you for your answer Pierre!!

This code will give us the total number of reads, their mean, and SD. But i think I wasn't clear in my post;

what I need is to get the mean and SD read length from each individual .fastq file within in the directory. So an example of the output

  • File Name Mean SD
  • ru1.fastq 235.4 -- 31.3
  • hg25.fastq 241.5 -- 35.5

and so on

sorry if I wasn't clear earlier and thank you so much for your response!!

ADD REPLYlink modified 11 months ago • written 11 months ago by eoin10
1
for F in *.fasq
do 
echo  "$F   "
awk '(... same... script)' $F
done
ADD REPLYlink modified 11 months ago • written 11 months ago by Pierre Lindenbaum104k

Perfect. thank you so much. I had tried a loop but left out echo

thanks a lot

ADD REPLYlink written 11 months ago by eoin10

please mark this question as answered (green tick on the left)

ADD REPLYlink written 11 months ago by Pierre Lindenbaum104k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1210 users visited in the last hour