Question: How can I extract annotation for genes from a GTF file that are more than 200 bp apart from neighboring genes?
0
gravatar for biplab
9 months ago by
biplab40
University of California, Davis
biplab40 wrote:

I am new in the field of computational biology. This questions might answered somewhere else but I could not find by searching. How can I extract annotation for genes from a GTF file that are more than 200 bp apart from neighboring genes? I was looking into bedtools for this but most functions in bedtools compare two files but I would like to compare genes within my annotation file. It will very helpful if someone can suggest how can I do this. For example:

Input files:

I   ensembl gene    335 649 .   +   .   gene_id "YAL069W"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    538 792 .   +   .   gene_id "YAL068W-A"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    1807    2169    .   -   .   gene_id "YAL068C"; gene_name "PAU8"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    2480    2707    .   +   .   gene_id "YAL067W-A"; gene_source "ensembl"; gene_biotype "protein_coding";

Output:

I   ensembl gene    1807    2169    .   -   .   gene_id "YAL068C"; gene_name "PAU8"; gene_source "ensembl"; gene_biotype "protein_coding";

Thank you so much.

rna-seq next-gen genome • 414 views
ADD COMMENTlink modified 9 months ago by Alex Reynolds24k • written 9 months ago by biplab40
2
gravatar for Alex Reynolds
9 months ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

Via BEDOPS gtf2bed and closest-features:

$ gtf2bed < genes.gtf > genes.bed
$ closest-features --no-ref --dist genes.bed genes.bed | awk -v threshold=200 -v FS='|' '{ if (($4>threshold)&&($4!="NA")) { print $3; }}' | uniq > genes_more_than_200_nt_apart.bed
ADD COMMENTlink modified 9 months ago • written 9 months ago by Alex Reynolds24k

Thank you so much. Very good solution.

ADD REPLYlink written 9 months ago by biplab40
1
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum110k wrote:

I was looking into bedtools for this but most functions in bedtools compare two files but I would like to compare genes within my annotation file.

how about using the same file twice ?

ADD COMMENTlink written 9 months ago by Pierre Lindenbaum110k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 549 users visited in the last hour