Organisms That Have One Gene But Not An Other
1
1
Entering edit mode
10.3 years ago
bwio ▴ 30

Hi, i am looking for organisms in a clade (here: firmicutes), that have one gene (say A) but lack the other (B). Whille I can easily find organisms that have gene A via BLAST search, I am struggling with the second constrain. Manually checking each organism found in A in a subsequent BLAST search with B as a query obvioucvsly is not an option. I tried blasting gene B too and compairing the lists of resulting organisms, but since the lists are not complete and only show the first 50/100 hits, this approach was unsuccessful.

Any suggestions on this problem? Thanks in advance.

Also note that both genes A and B may have multiple paralogs and are poorly/incensistently annotated, so comparison needs to be on the sequence.

blast • 1.8k views
ADD COMMENT
0
Entering edit mode
10.3 years ago
Pavel Senin ★ 1.9k

Why not to blast your genomes over the database of two gene variants and get a table of scores (edit: X and Y could be not a single number, but a vector consisting of score, alignment length, etc) like

GeneA GeneB
org1 X1 Y1
...
orgN XN YN

and then filter that table by some threshold on X and Y?

ADD COMMENT
0
Entering edit mode

I think BLAST is optimized for small queries and large databases. I am currently blasting my two genes locally against the uniprot/trembl database using -m 9 to give me ĵust a result table and a high -v and -b option (both 10000). Hopefully this value is high enough to give me all significant hits. Then it would just be a matter of matching the results to organisms (provided in the fasta annotation). Fingers crossed...

ADD REPLY
0
Entering edit mode

I think you are right by blasting all the way around - my bad.

ADD REPLY
0
Entering edit mode

Just for the record: the strategy works fine although it requires a lot of time/disk space and there is still some manual work needed.

ADD REPLY
0
Entering edit mode

Thanks for letting me know! Yeah that is the curse we all have to bear with, sometimes I have to wait days of computation wasting tons of space and network traffic just to re-run the pipeline because some parameter was set wrong.

ADD REPLY

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6