Question: How to remove some fasta sequences by header information from a large fasta file, any command and script please?
1
gravatar for seta
4.2 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Dear all,

I would like to remove some fasta sequences within a large fasta file in term of header information (sequence name), could anybody please help me out to this end?  Thanks so much in advance

the header information is like here:

> contig1

ATGCGTACGTCATG

>contig2

GCTACGTCCCA

blast rna-seq next-gen alignment • 5.2k views
ADD COMMENTlink modified 4.2 years ago by h.mon27k • written 4.2 years ago by seta1.2k
4
gravatar for Brian Bushnell
4.2 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

The BBMap package contains a tool called "FilterByName" which can do this:

filterbyname.sh in=file.fa out=filtered.fa names=contig1,contig2

It supports prefixes, case-sensitive or insensitive matching, inclusion or exclusion, and substring matching.  Rather than a list of names, you can instead point "names=" to another fasta or fastq file.

ADD COMMENTlink written 4.2 years ago by Brian Bushnell16k

Thanks Brian. I'm trying to use your tool to this end, but I faced with the following error. Could you please let me know what is wrong and how to solve it? 

Exception in thread "main" java.lang.AssertionError: Unknown parameter names.txt

        at driver.FilterReadsByName.<init>(FilterReadsByName.java:118)

        at driver.FilterReadsByName.main(FilterReadsByName.java:41)

Many thanks

ADD REPLYlink written 4.2 years ago by seta1.2k
1

What is wrong:

Unknown parameter names.txt

How to solve it:

Execute command as Brian suggested (using parameter names=).

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by PoGibas4.8k
4
gravatar for h.mon
4.2 years ago by
h.mon27k
Brazil
h.mon27k wrote:

There are several threads with similar questions, just enjoy the multitude of answers and choose the one most suited for you, see filter out fasta File by pattern, or filter out fasta File by pattern, or Parsing Fasta Based On Header.
 

ADD COMMENTlink written 4.2 years ago by h.mon27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 534 users visited in the last hour