Question

Comparing PopoolationTE2 output with reference .bed file in Python

0

Entering edit mode

21 months ago

Emilia • 0

I obtained the output file from PopoolationTE2 for my sample which generates TE insertions sites. It looks like that (col2 is the chromosome number, col3 - position, col5 - TE family):

1   1   4254339 .   hAT|9   hAT R   -   0,954
1   1   34804000    .   Stowaway|41 Stowaway    R   -   1,000
1   1   12839440    .   Tourist|15  Tourist F   -   1,000
1   1   11521962    .   Tourist|10  Tourist R   -   1,000
1   1   28197852    .   Tourist|11  Tourist F   -   1,000
1   1   7367886 .   Stowaway|36 Stowaway    R   -   1,000
1   1   13130538    .   Stowaway|36 Stowaway    R   -   1,000
1   1   6177708 .   hAT|4   hAT F   -   1,000
1   1   3783728 .   hAT|20  hAT F   -   1,000
1   1   10332288    .   uc|12   uc  R   -   1,000
1   1   15780052    .   uc|5    uc  R   -   1,000
1   1   28309928    .   uc|5    uc  R   -   1,000
1   1   31010266    .   uc|33   uc  R   -   0,967
1   1   4758653 .   uc|10   uc  F   -   1,000
1   1   3815830 .   uc|31   uc  R   -   0,879
1   1   5037968 .   Mutator|4   Mutator F   -   1,000

I want to compare it with the bed file representing TE sites for the reference genome. It looks like that:

1   12005   12348   RefBeet_TSD_Len:3_Tourist|7
1   56229   56700   RefBeet_TSD_Len:8_hAT|9
1   66241   66528   RefBeet_TSD_Len:9_Mutator|21
1   81966   82251   RefBeet_TSD_Len:2_Stowaway|39
1   84155   84402   RefBeet_TSD_Len:2_uc|1
1   84714   84841   RefBeet_Unknow_un_uc|28
1   98136   98349   RefBeet_TSD_Len:2_Stowaway|3
1   102325  102582  RefBeet_TSD_Len:2_Stowaway|12
1   103132  103267  RefBeet_Unknow_un_uc|33
1   108250  108580  RefBeet_TSD_Len:3_Tourist|17
1   115434  115695  RefBeet_Unknow_Len:8_uc|9

I want to check if TE insertions found in my sample occur in the reference, for example, if the first TE: hAT|9 in position on chromosome 1 in 4254339 will be found in the bed file in the range defined by column 2 as the start and 3 as the end.

I try to do it with pandas but I'm pretty confused.

Thanks for the suggestions!

bed python pandas PopoolationTE2 • 332 views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 21 months ago by Emilia • 0