Question: (Closed) Extracting fasta file according to Ids
0
gravatar for KVC_bioinfo
20 months ago by
KVC_bioinfo380
Boston
KVC_bioinfo380 wrote:

Hello,

I have a huge Fasta file. I would like to extract all the sequences with id such that the ids start from NM?

I tried following two commands which did not work for me.

awk‘BEGIN{RS=”>”}/NM/{print“>”$0}’huge.fasta

grep '^>NM' -B 1 huge.fasta > Nm.fasta

Could someone help me with a better solution. thank you in advance.

#extractsequence #fasta • 1.1k views
ADD COMMENTlink modified 20 months ago by James Ashmore2.6k • written 20 months ago by KVC_bioinfo380

is your huge fasta a multi-line fasta or single line fasta?

ADD REPLYlink written 20 months ago by st.ph.n2.4k

multi-line fasta file

ADD REPLYlink written 20 months ago by KVC_bioinfo380

Hello KVC_bioinfo!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

PS: Duplicate of how to keep fasta based on pattern in header. with 'index($0,"NM");}'
ADD REPLYlink written 20 months ago by Pierre Lindenbaum119k
0
gravatar for James Ashmore
20 months ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

Index your FASTA file:

samtools faidx input.fasta

Then get ids which start with "NM" and pipe them into samtools faidx to retrieve only those sequences:

cut -f 1 input.fasta.fai | egrep "^NM" | xargs samtools faidx input.fasta > result.fasta
ADD COMMENTlink modified 20 months ago • written 20 months ago by James Ashmore2.6k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour