Question: How do I remove certain sequences in fast based on header?
0
gravatar for tianshenbio
14 days ago by
tianshenbio50
tianshenbio50 wrote:

I have a fasta file like this:

>XM_0000001.1 
actact
>XR_0000001.1
atcatc

How do I remove all the sequences with a XR header?

I only want to keep:

>XM_0000001.1
actact
rna-seq sequence fasta • 74 views
ADD COMMENTlink modified 13 days ago by Hugo290 • written 14 days ago by tianshenbio50
0
gravatar for shiyeyishang
14 days ago by
shiyeyishang0 wrote:

If you do it on linux,it will be easy.

  1. Step 1: grep “>” file.fa | sed 's/>//g' > file.fa.id
  2. Step 2: grep -v 'XR_' file.fa.id > file.fa.id.final
  3. step 3: seqtk subseq file.fa file.fa.id.final > final.fa

PS: Seqtk is a software that you need to install.

edit:formatting.

ADD COMMENTlink modified 14 days ago by cpad011213k • written 14 days ago by shiyeyishang0
0
gravatar for cpad0112
14 days ago by
cpad011213k
India
cpad011213k wrote:

try with gnu-sed on ubuntu/mint:

$ sed  -e '/^>XR/,+1d' test.fa

If you have multiline fasta, use seqkit:

$ seqkit grep -rvip "^XR" test.fa
ADD COMMENTlink modified 13 days ago • written 14 days ago by cpad011213k
0
gravatar for Hugo
13 days ago by
Hugo290
Universidade de Vigo, Ourense (Spain)
Hugo290 wrote:

You can try SEDA (https://www.sing-group.org/seda/). The Pattern filtering operation (https://www.sing-group.org/seda/manual/operations.html#pattern-filtering) would allow you to do this if you configure a Not contains pattern with the "^XR_" text.

ADD COMMENTlink written 13 days ago by Hugo290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour