Dear all,
I would like to subset a FASTA file so that I get the sequences belonging to a certain phylum (in my case: Nematoda). The headers of the FASTA file start with the phylum name, so I thought this would be straightforward to do this in R, but I don't know how.
The FASTA file I am referring to can be downloaded from: http://www.reference-midori.info/download.php# , and I downloaded the following database: 'download/Databases/GenBank250/DADA2/longest/MIDORI2_LONGEST_NUC_GB250_CO1_DADA2.fasta.gz'.
Any tips?
Thanks! Ellen
Post examples of actual fasta headers since that is critical information to get help.
my apologies! I actually did not appear to get a notification that a reply had been posted, hence the delay.
here are some example headers:
[1] "Discosea_555280;Flabellinia_1485085;order_Vannellidae_95227;Vannellidae_95227;Clydonella_218657;Clydonella"
[2] "Discosea_555280;Flabellinia_1485085;order_Vannellidae_95227;Vannellidae_95227;Paravannella_1443143;Paravannella"