Post-processing of reciprocal best hits output
0
0
Entering edit mode
7.8 years ago
mforthman ▴ 50

I've posted this question before without any success in resolving my issue; one answer was provided and did not work. Rather than entirely redoing the old thread, I'm creating a new one. I performed pairwise RBH with transcriptomes against a reference. I pooled the individual outputs into one:

query id    query length    subject id  subject length  qframe  qstart  qend    sframe  sstart  send    evalue  bitscore    pident  nident  length  qframe  qstart  qend    sframe  sstart  send    evalue  bitscore    pident  nident  length
Apil_comp27594_c0_seq1  274 OFAS000003-RA-EXON01    483 1   140 269 1   3   133 2.00E-37    152 87.79   115 131 1   3   133 1   140 269 1.00E-36    152 87.79   115 131
Btri_comp28710_c0_seq1  495 OFAS000003-RA-EXON01    483 1   43  170 1   130 3   3.00E-39    159 89.06   114 128 1   3   130 1   170 43  9.00E-39    159 89.06   114 128
Apil_comp19172_c0_seq1  2893    OFAS000101-RA-EXON03    229 1   306 532 1   1   227 7.00E-52    204 82.89   189 228 1   1   227 1   306 532 1.00E-52    204 82.89   189 228
Ctom_512426543_01008091.1   743 OFAS000115-RA-EXON01    531 1   14  188 1   357 531 1.00E-33    141 81.14   142 175 1   357 531 1   14  188 8.00E-34    141 81.14   142 175
Apil_comp16418_c0_seq1  2079    OFAS000119-RA-EXON01    1584    1   414 2001    1   1   1584    0   963 77.85   1248    1603    1   1   1584    1   414 2001    0   963 77.85   1248    1603
Atri_comp13712_c0_seq1  1938    OFAS000119-RA-EXON01    1584    1   199 1640    1   1455    14  0   913 78.21   1134    1450    1   14  1455    1   1640    199 0   913 78.21   1134    1450
Ctom_512431611_01003023.1   2162    OFAS000119-RA-EXON01    1584    1   104 1393    1   1304    13  0   861 78.78   1021    1296    1   13  1304    1   1393    104 0   861 78.78   1021    1296
Atri_comp37099_c0_seq1  568 OFAS000126-RA-EXON01    219 1   7   217 1   219 9   1.00E-58    224 85.78   181 211 1   9   219 1   217 7   1.00E-58    224 85.78   181 211
Apil_comp19291_c1_seq1  7078    OFAS000131-RA-EXON13    286 1   6308    6602    1   1   286 4.00E-73    276 84.12   249 296 1   1   286 1   6308    6602    4.00E-74    276 84.12   249 296
Atri_comp13247_c0_seq1  4998    OFAS000131-RA-EXON13    286 1   438 732 1   286 1   1.00E-66    254 82.71   244 295 1   1   286 1   732 438 2.00E-67    254 82.71   244 295
Btri_comp25977_c0_seq1  450 OFAS000146-RA-EXON01    514 1   196 341 1   514 368 2.00E-34    143 84.35   124 147 1   368 514 1   341 196 9.00E-34    143 84.35   124 147
Btri_comp11536_c0_seq1  1754    OFAS000161-RA-EXON02    225 1   348 565 1   8   225 4.00E-52    204 83.49   182 218 1   8   225 1   348 565 2.00E-52    204 83.49   182 218
Atri_comp27210_c0_seq1  613 OFAS000167-RA-EXON01    2433    1   10  603 1   2052    1456    1.00E-103   374 78.2    470 601 1   1456    2052    1   603 10  1.00E-102   374 78.2    470 601
Ctom_512426598_01008036.1   1040    OFAS000206-RA-EXON01    1897    1   4   806 1   1203    376 1.00E-115   414 76.29   634 831 1   376 1203    1   806 4   2.00E-115   414 76.29   634 831
Btri_comp20624_c0_seq1  1763    OFAS000234-RA-EXON01    6339    1   1201    1721    1   534 14  9.00E-44    176 73.31   390 532 1   14  534 1   1721    1201    1.00E-42    176 73.26   389 531
Apil_comp19456_c0_seq1  3549    OFAS000248-RA-EXON07    4189    1   759 3223    1   1   2465    0   1511    77.84   1928    2477    1   1   2465    1   759 3223    0   1511    77.84   1928    2477
Btri_comp15171_c0_seq1  3766    OFAS000248-RA-EXON07    4189    1   591 3048    1   2458    1   0   1736    79.48   1960    2466    1   1   2458    1   3048    591 0   1736    79.48   1960    2466
Atri_comp2678_c0_seq1   578 OFAS000260-RA-EXON01    1044    1   19  548 1   499 1028    3.00E-84    309 77.21   410 531 1   499 1028    1   19  548 2.00E-83    309 77.21   410 531
Btri_comp13833_c0_seq1  1368    OFAS000260-RA-EXON01    1044    1   278 1282    1   1005    1   0   732 79.88   806 1009    1   1   1005    1   1282    278 0   732 79.88   806 1009
Btri_comp12981_c0_seq1  1656    OFAS000280-RA-EXON05    266 1   596 831 1   239 4   2.00E-60    231 84.32   199 236 1   4   239 1   831 596 1.00E-60    231 84.32   199 236
Atri_comp11944_c0_seq1  603 OFAS000313-RA-EXON22    300 1   89  182 1   1   95  2.00E-25    113 88.42   84  95  1   1   95  1   89  182 4.00E-25    113 88.42   84  95
Btri_comp21809_c0_seq1  1434    OFAS000322-RA-EXON01    235 1   890 988 1   227 129 1.00E-17    89.8    82.83   82  99  1   129 227 1   988 890 5.00E-18    89.8    82.83   82  99
Apil_comp22429_c0_seq1  1431    OFAS000357-RA-EXON13    200 1   1070    1269    1   200 1   3.00E-42    171 82.18   166 202 1   1   200 1   1269    1070    1.00E-42    171 82.18   166 202
Apil_comp12506_c0_seq1  1808    OFAS000367-RA-EXON01    1605    1   120 1724    1   1605    1   0   1020    78.17   1257    1608    1   1   1605    1   1724    120 0   1020    78.23   1261    1612
Atri_comp19469_c0_seq1  1743    OFAS000386-RA-EXON06    262 1   961 1222    1   1   262 2.00E-36    152 77.27   204 264 1   1   262 1   961 1222    7.00E-37    152 77.27   204 264
Ctom_512432154_01002480.1   684 OFAS000386-RA-EXON06    262 1   369 624 1   256 1   6.00E-37    152 77.61   201 259 1   1   256 1   624 369 2.00E-37    152 77.52   200 258
Acur_512410417_01005903.1   1021    OFAS000453-RA-EXON04    210 1   923 1021    1   9   107 3.00E-36    150 93.94   93  99  1   9   107 1   923 1021    5.00E-37    150 93.94   93  99
Apil_comp9637_c0_seq1   1465    OFAS000453-RA-EXON04    210 1   1235    1437    1   1   205 4.00E-66    250 88.78   182 205 1   1   205 1   1235    1437    2.00E-66    250 88.78   182 205
Btri_comp5378_c0_seq1   1334    OFAS000453-RA-EXON04    210 1   1160    1332    1   1   173 1.00E-60    231 90.75   157 173 1   1   173 1   1160    1332    7.00E-61    231 90.75   157 173

I need to process this file so that I can create fasta files containing query sequences that hit to a given reference sequence. The query sequences should span the qstart and qend ranges within the fasta files. I would prefer fasta files that did not include the reference sequence, but would like the reference ID to be used for the fasta file name.

I have searched all over for a script or some insight on how to do this. People have used RBH for multispecies comparisons, beyond two species, and used these results to generate gene datasets. It's easy to get a script to perform RBH, but I cannot find anything that will let me process the output to generate gene datasets. Any advice would be most appreciated.

blast reciprocal best hits fasta • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6