Question: how to remove specific sequences from multi-fasta file which contains N ?
0
gravatar for k.kathirvel93
5 months ago by
k.kathirvel93260
India
k.kathirvel93260 wrote:

Hi EveryOne,

I have a multifasta file which contains 11000 (30kb each) genomes. Now i want to remove all the reads(whole genome) which contains N (minimum atleast one N ). How can I do this with sed or awk? Thanks in advance.

I have input like this :

Genome1 ATCGTCGTACAGATACAGATACANNNcGATAGACATAGACA

Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC

I want output like this

Genome2 AGTCGATCAGTACAGATACAGATACAGATACAGATAC

ADD COMMENTlink written 5 months ago by k.kathirvel93260

Hello k.kathirvel93!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink written 5 months ago by Pierre Lindenbaum131k

Thanks @Pierre Lindenbaum, I have gone through that thread you mentioned, but it was not working fine with my large data, coz after executed that code, still the genome have Ns. Since that thread was 4 yrs old, i created my own thread. Can you help with this? Thanks

ADD REPLYlink modified 5 months ago • written 5 months ago by k.kathirvel93260

have you found a solution?

ADD REPLYlink written 1 day ago by bioinfo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1721 users visited in the last hour