Filter number of N's fasta file
0
0
Entering edit mode
2.4 years ago
gubrins ▴ 290

Heys,

Once again I need your programming help. I have a lot of fasta files made out of 1Mb sliding window along a reference genome. As there areas in the genome that are not really well sequenced or that the sample has not a lot of data, I would like to remove the files where at least one sample has half of the information as N. How could I do that?

Thanks a lot in advance!

bash programming fasta • 650 views
ADD COMMENT
1
Entering edit mode

where at least one sample has half of the information as N

i don't understand.

ADD REPLY
0
Entering edit mode

sorry Pierre. Each one of my fasta files has 1Mb of information. I would like to know if any sample within each fasta file has 50% or more bases as N.

ADD REPLY
2
Entering edit mode

Counting N'S Within Fasta

You can use stats.sh program from BBMap suite to generate the base distribution (only relevant part is posted here). You can easily see files where N content would be > 50%.

$ stats.sh in=t2.fa
A   C   G   T   N   IUPAC   Other   GC  GC_stdev
0.2394  0.2810  0.1935  0.2861  0.0096  0.0000  0.0000  0.4745  0.0000
ADD REPLY

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6