I have not looked through your code - only read the description and your GitHub example. Take that into account when considering my feedback.
Combining multiple classifiers - I will call it ensemble voting here - is an area of research with a long-standing tradition. What you are trying to do has been studied for a long time, and your approach appears rather simplistic compared to the state of the art. It is very easy to find lots of literature about ensemble classification, so I will not give any references here. I suggest you look for "blending classification models" as your initial search term.
There are at least two problems with your approach: 1) you don't appear to have a gold standard; 2) you are most likely overfitting because there is no out-of-sample data that is used for independent verification.
Without a gold standard, you are weighting based purely on majority. That means that if your majority is wrong 10% of time, your weights don't take that into account but it may still be OK. If your majority vote is wrong 30% of time or more, your weights will be completely wrong and you will be pushing wrong models on top. Weights must be assigned in the context of how predictions relate to correct answers: you want to give a higher weight to a prediction because that prediction is correct, rather than because that prediction is in majority.
You appear to have a small dataset, and with those there is always a potential for overfitting. It is relatively easy for any modern classifier to "learn" the data, such that it appears to be doing well on that particular subset of data but not necessarily on newly acquired data. I see people create classifiers all the time with 98-99% accuracy, and many of them completely crumble on new data. Now, ensemble voting helps with this problem by providing multiple "experts" that can potentially disagree. Still, it is impossible to verify the quality of any classifier by training on all the data. A subset of data must be set aside (a validation dataset), and it is used to monitor and adjust the training process. That goes for individual classifiers and ensemble classifiers. Your approach does not seem to have any kind of validation. Without getting into the weeds, I will suggest that you read about N-fold (or K-fold) validation and/or hold-out validation.
I have saved the part that is least likely to please you for the end - and please forgive my bluntness, because you have doubtless put lots of work into this package. There are literally hundreds of solutions for what you are trying to do, and they are likely much better than yours.