how to remove specific sequences from multi-fasta file which contains N ?
0
0
Entering edit mode
4.0 years ago
k.kathirvel93 ▴ 300

Hi EveryOne,

I have a multifasta file which contains 11000 (30kb each) genomes. Now i want to remove all the reads(whole genome) which contains N (minimum atleast one N ). How can I do this with sed or awk? Thanks in advance.

I have input like this :

Genome1 ATCGTCGTACAGATACAGATACANNNcGATAGACATAGACA

Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC

I want output like this

Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC

genome sequencing sequence alignment • 1.5k views
ADD COMMENT
0
Entering edit mode

Hello k.kathirvel93!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

Thanks @Pierre Lindenbaum, I have gone through that thread you mentioned, but it was not working fine with my large data, coz after executed that code, still the genome have Ns. Since that thread was 4 yrs old, i created my own thread. Can you help with this? Thanks

ADD REPLY
0
Entering edit mode

have you found a solution?

ADD REPLY

Login before adding your answer.

Traffic: 1537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6