exclude gene with features from gff3 file
1
0
Entering edit mode
6.4 years ago
Chris ▴ 30

Hi all, I have a file with gene names like this one (file1):

AT1G01010

AT1G01020

AT1G01030

AT1G01040

AT1G03993

AT1G01050

AT1G03997

AT1G01060

AT1G01070

AT1G01080

and I have a gff3 file for the whole genome in this link: https://drive.google.com/file/d/1q0L1SbKFPulhUGc0mXk4_REuxlu8ZJsY/view?usp=sharing

I need to have a new gff3 file where the the genes and the features of those genes (exons, introns etc) are removed.

Any help is highly appreciated.

thank you for your help in advance.

genome next-gen gff3 • 1.5k views
ADD COMMENT
0
Entering edit mode
6.4 years ago
Hussain Ather ▴ 990

Python. This works if you have no empty lines in file1.

f1 = open("Arabidopsis_thaliana.TAIR10.37.gff3", "r")
f2 = open("file1", "r")
o = open("excluded.txt", "w")
genes = []
for line in f2.readlines():
    genes.append(line.replace("\n", ""))
def gene_check(line, genes):
    for gene in genes:
        if gene in line:
            return
    o.write(line)
    return
f2.close()
for line in f1.readlines():
    gene_check(line, genes)
f1.close()
o.close()

EDIT:

This also works if you have no empty lines in file1

grep -vFwf file1 Arabidopsis_thaliana.TAIR10.37.gff3 > excluded.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6