Question: How to delete everything after a certain point in GFF3 file - PYTHON
0
gravatar for caseyd7
2.4 years ago by
caseyd70
caseyd70 wrote:

I have a GFF3 file and at the bottom of the file there is a FASTA report of the genome.

I want to delete everything below the line that says '##FASTA' - including that line so that all i have left is the regular GFF report with out the FASTA.

I need to do this for multiple files. Please help.

gff3 • 1.0k views
ADD COMMENTlink modified 2.4 years ago by James Ashmore2.8k • written 2.4 years ago by caseyd70
0
gravatar for James Ashmore
2.4 years ago by
James Ashmore2.8k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.8k wrote:

Lets try with an example file (using line numbers instead of actual GFF content):

$ cat test.gff
1
2
3
4
5
6
7
8
9
##FASTA
11
12
13
14
15

Find the line number which the pattern first '##FASTA' appears (for example say line 10):

egrep -n -m 1 '##FASTA' test.gff

Find the total number of lines in your file (for example say line 15):

wc -l test.gff | awk '{print $1}'

Delete the lines starting at the line number where your pattern first appears and ending at the end of the file:

sed '15,30d' test.gff > result.gff

Package this up into a small shell script and run on each file.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by James Ashmore2.8k
1

True, but it would be even easier to just use head instead of wc -l and sed.

egrep -n -m 1 '##FASTA' test.gff

head -n 15 test.gff > result.gff

ADD REPLYlink written 2.4 years ago by colindaven1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour