I'm trying to only print lines in a GTF file with the "tag" field "appris_principal" AND if that tag doesn't exist, then the ones tagged with "appris_candidate_longest" are selected, for any given gene.
I think I can code it up in python but there must be a way to do it in awk?
Why not
grep? That might be the easiest and the quickest.Oh yeah let's not forget grep. But I'm not sure how to make the condition if appris_principal doesn't exist in this line, check whether appris_candidate_longest exists. I neither, don't print.
Extract matching lines:
Extract non-matching lines:
... ...
Check for lines that have appris_candidate_longest AND NOT appris_principal:
honestly I would just do it in Python. Use
csv.DictReader. Shouldnt take more than a dozen lines.