Question: (Closed) Extracting fasta file according to Ids
0
gravatar for KVC_bioinfo
2.1 years ago by
KVC_bioinfo410
Boston
KVC_bioinfo410 wrote:

Hello,

I have a huge Fasta file. I would like to extract all the sequences with id such that the ids start from NM?

I tried following two commands which did not work for me.

awk‘BEGIN{RS=”>”}/NM/{print“>”$0}’huge.fasta

grep '^>NM' -B 1 huge.fasta > Nm.fasta

Could someone help me with a better solution. thank you in advance.

#extractsequence #fasta • 1.3k views
ADD COMMENTlink modified 2.1 years ago by James Ashmore2.7k • written 2.1 years ago by KVC_bioinfo410

is your huge fasta a multi-line fasta or single line fasta?

ADD REPLYlink written 2.1 years ago by st.ph.n2.5k

multi-line fasta file

ADD REPLYlink written 2.1 years ago by KVC_bioinfo410

Hello KVC_bioinfo!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

PS: Duplicate of how to keep fasta based on pattern in header. with 'index($0,"NM");}'
ADD REPLYlink written 2.1 years ago by Pierre Lindenbaum123k
0
gravatar for James Ashmore
2.1 years ago by
James Ashmore2.7k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.7k wrote:

Index your FASTA file:

samtools faidx input.fasta

Then get ids which start with "NM" and pipe them into samtools faidx to retrieve only those sequences:

cut -f 1 input.fasta.fai | egrep "^NM" | xargs samtools faidx input.fasta > result.fasta
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by James Ashmore2.7k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2664 users visited in the last hour