Question

Help in using GenomicRanges from R to get neighbouring genes

0

Entering edit mode

4.0 years ago

GZM ▴ 20

Hello,

I have two tables that I got from analysis with hmmsearch. Table 1 has a set of homologues of protein 1, stating a series of informations, including Start and End positions of the encoding gene. Table 2 has a set of homologues of protein 2, with the same informations.

The style of the table would be something like :

ID  Source  Nucleotide Accession    Protein Protein Name    Start   Stop    Strand  Organism     Strain Assembly
74488271    RefSeq  NC_009933.1 WP_041661553.1  hypothetical protein    4265907 4267559 +   Acaryochloris marina MBIC11017  MBIC11017   GCF_000018105.1
13866598    RefSeq  NC_009927.1 WP_012167081.1  hypothetical protein    156877  157254  -   Acaryochloris marina MBIC11017  MBIC11017   GCF_000018105.1
13867103    RefSeq  NC_009928.1 WP_012167419.1  hypothetical protein    121712  122089  -   Acaryochloris marina MBIC11017  MBIC11017   GCF_000018105.1
13865815    RefSeq  NC_009925.1 WP_012166309.1  hypothetical protein    6255930 6256316 +   Acaryochloris marina MBIC11017  MBIC11017   GCF_000018105.1
13867540    RefSeq  NC_009930.1 WP_012167945.1  hypothetical protein    106295  106678  -   Acaryochloris marina MBIC11017  MBIC11017   GCF_000018105.1

What I would like to do, is to compare row1 in table2 with every row in table 1 and if the Nucleotide ID matches, then compare Stop position in table 1 with Start position in Table2 and if the difference Start2-Stop1 is < 50, then I'd like the whole row to be written to a new table (i.e Basically. I only want proteins in Table 2 that are directly downstream of proteins in Table 1, within the same genome) Then the same process should be repeated for each row in table 1

I looked at different methods to try to do this both in python(with pandas( and R(with GenomicRanges and data.table) , but could not come up with a solution. Is this something feasible at all ?

Thanks

R gene cross_checking_Results • 573 views

ADD COMMENT • link updated 4.0 years ago by Kevin Blighe 87k • written 4.0 years ago by GZM ▴ 20

0

Entering edit mode

i dont understand where table 1 finishes and table 2 starts.

ADD REPLY • link 4.0 years ago by lessismore ★ 1.3k

0

Entering edit mode

Please add a clear example input and a representative output.

ADD REPLY • link 4.0 years ago by ATpoint 82k

0

Entering edit mode

You might be able to modify this code or modify your input and get the result that you need:

A: Best tool for finding Boundary Pairs

This might not be efficient if your tables are large, but you can store both of your tables in one Table, and use the first column as an indicator of what table you're using.

awk '{print 1"\t" $0}' Table1 > Table
awk '{print 2"\t" $0}' Table2 >> Table

ADD REPLY • link 4.0 years ago by Fatima ▴ 1000