Reduce/Select reads/lines ffrom files using terminal
3
0
Entering edit mode
7.0 years ago
dimitrischat ▴ 210

Hello all. I have .fastq files but with many reads/lines , how can i reduse that number f.e. 10000 lines and get a new file? Only using the terminal, not with some program ( macOS ). i would like a way that i can use any type of file and be able to reduse to whatever number

RNA-Seq ChIP-Seq sequence • 1.5k views
ADD COMMENT
0
Entering edit mode

I had answered this question yesterday in a different thread posted by OP: C: tophat 2 rna seq

ADD REPLY
2
Entering edit mode
7.0 years ago
jonasmst ▴ 410

You can use sed to remove lines from a file (which is what I think you want?). The following command removes lines 45000-55000 from yourfile.fastq and saves the result in a new file called shorterfile.fastq:

sed -e '45000,55000d' < yourfile.fastq > shorterfile.fastq

Or you can use head or tail to extract the first (head) or last (tail) number of lines from a file, e.g.:

head -n 10000 yourfile.fastq > newfile.fastq
ADD COMMENT
1
Entering edit mode
7.0 years ago

The BBMap package has a tool called Reformat which can do this:

reformat.sh in=file.fastq out=reduced.fastq samplereadstarget=10000

That will sample 10000 reads randomly from the file. It can also process paired fastq files at once (with the in1 and in2 flags), keeping pairs together.

ADD COMMENT
0
0
Entering edit mode

the first link is about splitting, i want to select the first x lines from the fastq file

ADD REPLY
2
Entering edit mode

If you are going to sample reads then I suggest not selecting first "x" reads from a file. Those reads may represent bad parts of data since they would be starting at the edge of the flowcell. With old(er) Illumina data a lot of the initial reads in a file used to have N's/bad Q scores.

ADD REPLY
0
Entering edit mode

In my experience, they still do... :)

ADD REPLY

Login before adding your answer.

Traffic: 2104 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6