Question: How can I extract annotation for genes from a GTF file that are more than 200 bp apart from neighboring genes?
0
gravatar for biplab
9 weeks ago by
biplab20
University of California, Davis
biplab20 wrote:

I am new in the field of computational biology. This questions might answered somewhere else but I could not find by searching. How can I extract annotation for genes from a GTF file that are more than 200 bp apart from neighboring genes? I was looking into bedtools for this but most functions in bedtools compare two files but I would like to compare genes within my annotation file. It will very helpful if someone can suggest how can I do this. For example:

Input files:

I   ensembl gene    335 649 .   +   .   gene_id "YAL069W"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    538 792 .   +   .   gene_id "YAL068W-A"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    1807    2169    .   -   .   gene_id "YAL068C"; gene_name "PAU8"; gene_source "ensembl"; gene_biotype "protein_coding";
I   ensembl gene    2480    2707    .   +   .   gene_id "YAL067W-A"; gene_source "ensembl"; gene_biotype "protein_coding";

Output:

I   ensembl gene    1807    2169    .   -   .   gene_id "YAL068C"; gene_name "PAU8"; gene_source "ensembl"; gene_biotype "protein_coding";

Thank you so much.

rna-seq next-gen genome • 190 views
ADD COMMENTlink modified 9 weeks ago by Alex Reynolds22k • written 9 weeks ago by biplab20
2
gravatar for Alex Reynolds
9 weeks ago by
Alex Reynolds22k
Seattle, WA USA
Alex Reynolds22k wrote:

Via BEDOPS gtf2bed and closest-features:

$ gtf2bed < genes.gtf > genes.bed
$ closest-features --no-ref --dist genes.bed genes.bed | awk -v threshold=200 -v FS='|' '{ if (($4>threshold)&&($4!="NA")) { print $3; }}' | uniq > genes_more_than_200_nt_apart.bed
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Alex Reynolds22k

Thank you so much. Very good solution.

ADD REPLYlink written 9 weeks ago by biplab20
1
gravatar for Pierre Lindenbaum
9 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum102k wrote:

I was looking into bedtools for this but most functions in bedtools compare two files but I would like to compare genes within my annotation file.

how about using the same file twice ?

ADD COMMENTlink written 9 weeks ago by Pierre Lindenbaum102k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 649 users visited in the last hour