Filtering fastq file
2
1
Entering edit mode
7.3 years ago
#### ▴ 220

I have a fastq files and I found many reads with only N. I want to filter only those reads which contains N. Any one liner for that?

RNA-Seq • 3.2k views
ADD COMMENT
0
Entering edit mode

You can use bbduk.shor reformat.sh from BBMap suite with qtrim=rl trimq=1. That will only trim trailing and leading bases with Q-score below 1, which means Q0, which means N (in either fasta or fastq format). If the entire read is N then it will be taken out.

ADD REPLY
0
Entering edit mode

It didnt work they way you said. I want all the reads with N in separate file.

ADD REPLY
0
Entering edit mode

Use outm= to grab filtered reads is a separate file. If you have a paired-end data set then you can use outm1= and outm2= to grab both reads.

bbduk.sh in=file.fq qtrim=rl trimq=1 out=clean.fq outm=capture.fq minlength=read_length. This will capture any read that has at least one N in the outm file (replace read_length = number of cycles in your reads).

Note: This will trim the N's out though. Which is not what you seem to want. So use @Pierre's solution for now until I (or Brian Bushnell ) can help figure out a BBMap way.

ADD REPLY
0
Entering edit mode

#### : Closing a post is an action generally use by moderators during moderation.

If your question has been solved then accept one (or more) of the answers below (green check mark) to provide closure to the thread.

ADD REPLY
3
Entering edit mode
7.3 years ago

To separate reads with Ns using BBMap:

bbduk.sh in=reads.fq out=readsWithoutNs.fq outm=readsWithNs.fq maxns=0

If you have, say, 100bp reads and only want to separate reads containing all 100 Ns, change that to "maxns=99".

ADD COMMENT
0
Entering edit mode
7.3 years ago
gunzip -c input.fastq.gz | paste - - - - | awk -F '\t' '!($2 ~ /N/)' | tr "\t" "\n" > noN.fq

remove the '!' to get the 'N' only

ADD COMMENT

Login before adding your answer.

Traffic: 1734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6