On CBioInfCpp.h as a C++ lib containing some functions for bioinformatics
Though I am not a professional programmer, bionformatics is very interesting interdisciplinary field for me.
I see it, the Python is a "standart language" in this field.
But when I solved problems at rosalind info, I used C++. So as a result a "lib of some function" has been borned.
The lib contains 3 groups of functions. The first one - input-output ones (in order to read-write vectors, matrixes, graphs from-to a file via only one commsnd as it is in Python).
The second group is "Working with strings". Contains some functions from computing GC-content, Edit Distance etc to finding all mutated strings in a given one.
The third is "Working with graphs". A data structure "Adjacency vector" is suggested. By the way, in general case, vertices may have negative integers assigned and graphs may have multiple loops and edges.
Some function such as Eulerian Cycle, Path finding, topological sorting etc are implemented.
May it be useful for some tasks?
I understand that this lib haven't a great majority of features. For example it is not able now to work with bioinformatic databases, but here I can not to implement it by myself only.
Free distributed source code and info is here:
and here: https://github.com/chernouhov/CBioInfCpp-0-
My profile at Rosalind info
Best regards, Chernouhov Sergey
- Group of function "FindIn" has been updated.
- Functions PairVectorCout, PairVectorFout has been updated.
- Group of function "GraphCout" and "GraphFout" has been added. So nowadays one may "cout/ fout" a graph that is set by Adjacency vector to screen/ to file line by line: one edge in one line.
- Function "StrToCircular" added for finding the circular string of minimal length of the given one.
- Group of function MaxFlowGraph" has been added to help find Maximal Flow, the paths of the maximal flow network and max-flow min-cut in a graph.
- A data structure "Adjacency map" (a modification of data structure for containing graphs "Adjacency vector") has been added. Adjacency map allows to have quicker access to edge’s weight, but it can’t work with multiple edges.
- Functions for converting Adjacency vector to Adjacency map and conversely AdjVectorToAdjMap and AdjMapToAdjVector have been added. Note that Multiple edges will be joined together.
- Function TandemRepeatsFinding has been added. It is intended for finding tandem repeats in the given string that may be useful for solving problems related to Microsatellite Instability etc.
- Function CIGAR1 has been added.
- Group of function "GraphCout" and "GraphFout" has been updated (so nowadays one may "cout/ fout" a graph that is set by both Adjacency vector and Adjacency map to screen/ to file line by line: one edge in one line).
- Function EditDistA as an extended version of the function EditDist has been added (returns not only the value of Edit Distance between 2 strings but also one possible version of the alignment itself).
- Group of function "NBPaths" (for finding maximal branching paths in a graph, both weighted or no, direcyed or no) has been added.
- Functions ConsStringQ1 and ConsStringQ2 for building consensus string upon a given collection of strings according to their quality has been added. Note that due to little data for testing errors may be found here (please notify if you found any).
- Function GenRandomUWGraph that generates a random unweighted graph (as its "Adjacency vector") has been added.
- Group of function intended to find collection of vertices for each strongly connected component of directed graph and to find collection of vertices for each connected component of undirected graph has been added.
- Group of function for counting edges multiplicity of a graph that is set by Adjacency vector has been added.
- Added group of functions AdjVectorToAdjMegaMap, AdjMegaMapToAdjVector to convert Adjacency vector to/ from Adjacency mega-map (i.e. extended version of Adjacency map to contain graphs having different multiply edges).
- Updated Group of function GraphCout and GraphFout to deal with mega-maps.
- Group of functions Num updated.
- Function ScoreStringMatrix that counts score (i.e. total number of mismatches) upon vector a of strings s added.
- Function GPPM that generates a position probability matrix (PPM) added. Note that pseudocounts may be used (the formula (Ns+z)/(N+2*z) is implemented).
For further updates please see here: A: CBioInfCpp.h as a C++ lib containing some functions for bioinformatics