Comparisons of genes coordinates
0
0
Entering edit mode
3.8 years ago
nlehmann ▴ 140

Hi,

I would like to compare genes coordinates between 2 GTF/GFF files. I tried using Gffcompare but the classification codes depends on the introns chains. I thought of something simpler, that would compare gene lengths and say if genes are equal, longer in 3' or longer in 5'. Let's take an example with longer in 3':

> cat A.gff3
chr1    ncbiRefSeq  gene    5000    8000    .   +   .   ID=gene1;Name=gene1
chr1    ncbiRefSeq  transcript  5000    8000    .   +   .   ID=XM_gene1;Parent=gene1
chr1    ncbiRefSeq  exon    5000    5524    .   +   .   Parent=XM_gene1
chr1    ncbiRefSeq  exon    6755    6829    .   +   .   Parent=XM_gene1
chr1    ncbiRefSeq  exon    7800    8000    .   +   .   Parent=XM_gene1

> cat B.gff3
chr1    ncbiRefSeq  gene    5000    10000   .   +   .   ID=gene1;Name=gene1
chr1    ncbiRefSeq  transcript  5000    10000   .   +   .   ID=XM_gene1;Parent=gene1
chr1    ncbiRefSeq  exon    5000    5524    .   +   .   Parent=XM_gene1
chr1    ncbiRefSeq  exon    6755    6829    .   +   .   Parent=XM_gene1
chr1    ncbiRefSeq  exon    7800    10000   .   +   .   Parent=XM_gene1

The comparison of these two GFF3 files should show that gene1 in B.gff3 is longer in 3' than A.gff3. I thought also of using bedtools overlap but this wouldn't give enough details (would just return that the genes overlap but we don't know how). If the gene in B started upstream (e.g. 4000), we should get the information that this gene is longer in 5' and in 3' in B compared to A.

Using gffcompare, the 2 genes are considered as equal because they have the same intron chains:

 > gffcompare -r A.gff3 B.gff3
 > cat gffcmp.B.gff3.tmap
ref_gene_id ref_id class_code qry_gene_id qry_id num_exons FPKM  TPM cov len major_iso_id   ref_match_len
gene1   XM_gene1    =   gene1   XM_gene1    3   0.000000    0.000000    0.000000    2801    XM_gene1    801

Thanks for the help.

gene RNA-Seq • 990 views
ADD COMMENT
0
Entering edit mode

Have you tried agat_sp_compare_two_annotations.pl from AGAT?

ADD REPLY
0
Entering edit mode

Yes, but it just gives genes that overlap between the 2 inputs, with no details on the way they overlap.

ADD REPLY

Login before adding your answer.

Traffic: 1488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6