6 weeks ago by
The awk solution is more general, but for simplicity, if I have single-line rather than hard-wrapped FASTAs, I prefer to do this with
grep -A and
grep -wA 1 '>NODE_19_length_5758_cluster_19_candidate_1' example.fasta | tail -n 1
-A 1 tells grep to return both the matched line and the line immediately after it, and then
tail takes the last (second) line of that result. (The
w is just for full-word matching in case you have similar sequence names that are subsets of each other.)
If you're sure that the entire sequence is on the next line, I find grep is easier to use in a parameterized loop than the awk version. I always get tangled up with the quoting. In fact, with the grep approach, you can even use a file to hold your list of desired headers. If you want both the headers and their sequences, just drop the second
grep -wf list-of-headers.txt -A 1 example.fasta | grep -v '^>'
If your sequences are spread across multiple lines, then use one of the awk solutions, or "unwrap" your FASTA first with a tool like fastx_toolkit.