i have a file named gene.txt contain sequence coordinates such as :
orf1=456-12512
orf2=2869-6898
another file contain multiple fasta sequence called project.fasta
i want to extract header with sequence for each one in a separate file, i tried the following seqkit command but it gives invalid std in
awk '{print $0}' gene.txt | seqkit grep -r -n -p $orf1 project.fasta
Are you looking to pull out entire sequences or just the coordinate boundaries that are in the examples above?
grepwould not be the correct tool for the former.You may want to use
samtools faidxinstead.