Question: How to delete everything after a certain point in GFF3 file - PYTHON
0
gravatar for caseyd7
19 months ago by
caseyd70
caseyd70 wrote:

I have a GFF3 file and at the bottom of the file there is a FASTA report of the genome.

I want to delete everything below the line that says '##FASTA' - including that line so that all i have left is the regular GFF report with out the FASTA.

I need to do this for multiple files. Please help.

gff3 • 658 views
ADD COMMENTlink modified 19 months ago by James Ashmore2.6k • written 19 months ago by caseyd70
0
gravatar for James Ashmore
19 months ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

Lets try with an example file (using line numbers instead of actual GFF content):

$ cat test.gff
1
2
3
4
5
6
7
8
9
##FASTA
11
12
13
14
15

Find the line number which the pattern first '##FASTA' appears (for example say line 10):

egrep -n -m 1 '##FASTA' test.gff

Find the total number of lines in your file (for example say line 15):

wc -l test.gff | awk '{print $1}'

Delete the lines starting at the line number where your pattern first appears and ending at the end of the file:

sed '15,30d' test.gff > result.gff

Package this up into a small shell script and run on each file.

ADD COMMENTlink modified 19 months ago • written 19 months ago by James Ashmore2.6k
1

True, but it would be even easier to just use head instead of wc -l and sed.

egrep -n -m 1 '##FASTA' test.gff

head -n 15 test.gff > result.gff

ADD REPLYlink written 19 months ago by colindaven1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour