Quick check of the RNASeq length for a bunch of files
1
0
Entering edit mode
7.0 years ago
ddzhangzz ▴ 90

I have >500 fastq.gz files from a RNASeq project and was told they have been run with either 125bp or 50bp. I am wondering if there is a quick way to check which files are 125bp and which are 50bp. One file seems not bad to check but for 500 files I wanted to find a better way.

RNA-Seq • 903 views
ADD COMMENT
0
Entering edit mode

Although there might be a solution which does what you want... This shouldn't be too hard to write a custom script for, do you have any experience with that? I would write it in python, but that's a personal preference.

ADD REPLY
3
Entering edit mode
7.0 years ago
GenoMax 141k

testformat.sh from BBMap suite.

$ testformat.sh file.fq.gz 
sanger  fastq   raw single-ended    118bp
ADD COMMENT
1
Entering edit mode

Or to test all files with one command:

testformat.sh *.fq

Only caveat is it does not recognizes separated paired files as paired, but this will have no effect for what you want to do.

edit: sorry, just noticed it doesn't output file name. A solution would be something like:

for i in *.fq
do
    echo $i; testformat.sh $i;
done
ADD REPLY
1
Entering edit mode

for i in *.fastq.gz; do echo $i | tr '\n' '\t' ; testformat.sh $i ; done

ADD REPLY

Login before adding your answer.

Traffic: 2501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6