I want to merge the genes in a bedfile so that only the parts of the gene that overlap become a new feature. E.g., have this bedfile:
1    pseudogene      gene    11869   14412   .       +       .       gene_id "ENSG00000223972" 
1    pseudogene      gene    14363   29806   .       -       .       gene_id "ENSG00000227232"; 
1    lincRNA         gene    29554   31109   .       +       .       gene_id "ENSG00000243485";
which I can merge with
bedtools merge -c 1,2,3,4,5,6,7,8,9 -d -1 -o distinct,distinct,distinct,distinct,distinct,distinct,distinct,distinct,distinct
which gives me
1   11868   31109   1   lincRNA,pseudogene  gene    11869,14363,29554   14412,29806,31109   .   +,- .   gene_id "ENSG00000223972";, gene_id "ENSG00000227232",gene_id "ENSG00000243485";
But what I would like to have is something like this
1    pseudogene      gene    11869   14362    .       +       .       gene_id "ENSG00000223972";
1    pseudogene      gene    14363   14412    .       +       .       gene_id "ENSG00000223972";,gene_id "ENSG00000223972";
1    pseudogene      gene    14413   29553   .       +,-       .       gene_id "ENSG00000227232";
1    lincRNA,pseudogene         gene    29554   29806   .       +,-       .        gene_id "ENSG00000227232";,gene_id "ENSG00000243485";
1    lincRNA         gene    29807   31109   .       +       .       gene_id "ENSG00000243485";
So only the actually overlapping parts are made into separate features. Is this possible with bedtools or is there some other tool that does this?
Thanks for the detailed explanation! For others that find this question, this is how I went from ensemble GTF to BED with CHR, start, stop, ensemble gene ID:
output:
Or you could just use BEDOPS
gtf2bed:The
gtf2bedcall will create sorted BED.