Tool: CBioInfCpp.h as a C++ lib containing some functions for bioinformatics
2
gravatar for chernouhov sergey
21 months ago by
Russian Federation
chernouhov sergey50 wrote:

Dear Sirs.

Though I am not a professional programmer, bionformatics is very interesting interdisciplinary field for me.

I see it, the Python is a "standart language" in this field.

But when I solved problems at rosalind info, I used C++. So as a result a "lib of some function" has been borned.

The lib contains 3 groups of functions. The first one - input-output ones (in order to read-write vectors, matrixes, graphs from-to a file via only one commsnd as it is in Python).

The second group is "Working with strings". Contains some functions from computing GC-content, Edit Distance etc to finding all mutated strings in a given one.

The third is "Working with graphs". A data structure "Adjacency vector" is suggested. By the way, in general case, vertices may have negative integers assigned and graphs may have multiple loops and edges. Some function such as Eulerian Cycle, Path finding, topological sorting etc are implemented.

May it be useful for some tasks?

By the way, that algorithmic functions and problems should be included or maybe solved here?

I understand that this lib haven't a great majority of features. For example it is not able now to work with bioinformatic databases, but here I can not to implement it by myself only.

Free distributed source code and info is here: https://drive.google.com/open?id=1FQwsQm2kG_nTO45ab0yj52xtp6_B4IB2

(This is a link to directory (not to a file) that contains source code file and readme files)

My profile at Rosalind info http://rosalind.info/users/chernouhov/

Best regards, Chernouhov Sergey

Modifyed at 03 may 2019:

added to GitHub as users ask for it: https://github.com/chernouhov/CBioInfCpp-0-

But:

  • GitHub is a new experience for me, so probably I DO some mistakes there.
  • Why only GitHub is the trusted place? We may be free but only there?

I Do declare that I DO NOT clearly understand all about GitHub so nowdays I use it only as a filehosting as it is so popular place.

Best regards, Chernouhov Sergey

tool c++ • 1.5k views
ADD COMMENTlink modified 12 days ago • written 21 months ago by chernouhov sergey50
3

Hi, maybe you're interested in contributing to the c++ SeqAn library?

https://www.seqan.de/

ADD REPLYlink written 21 months ago by colindaven2.6k

Hi.

Thanks. It's a great idea. Why not?

By the way, maybe you use SeqAn library or maybe you participate in its development?

ADD REPLYlink modified 21 months ago • written 21 months ago by chernouhov sergey50

added to GitHub as users ask for it: https://github.com/chernouhov/CBioInfCpp-0-

But: - GitHub is a new experience for me, so probably I DO some mistakes there. - Why only GitHub is the trusted place? We may be free but only there?

I Do declare that I DO NOT clearly understand all about GitHub so nowdays I use it only as a filehosting as it is so popular place.

ADD REPLYlink modified 21 months ago • written 21 months ago by chernouhov sergey50
2

It’s not the only trusted place, its just become the most common/most well known. You can also see the source code directly, versus having to trust a file blindly.

Bitbucket, sourceforge, gitlab etc are all still used, just to varying and lesser extents.

ADD REPLYlink written 21 months ago by Joe18k

Well, we DID talk about GitHub but we DID NOT talk about the lib itself. Nowadays it is hosted at GitHub. But it is not it's key feature, as it is for every item - both good and bad - isn't it?

ADD REPLYlink written 20 months ago by chernouhov sergey50
2

There is no need to post the same comment multiple times (I know these threads can get a little disorganised over time, but once is enough).

I'm not sure I understand your question? There's nothing more to say regarding the lib or github as far as I can see? You've uploaded it in a nice, visible place. If people want to use it, they'll use it.

ADD REPLYlink written 20 months ago by Joe18k

GitHub is not just a file hosting site, it hosts and helps manage Git projects. By Git projects, I mean Git repositories with issues, Pull Requests, etc. Git repositories are essentially version-controlled code directories allowing for concurrent development and change tracking along with a host of other amazing features. If you're new to Git, you should definitely learn it as it will better your approach to software development.

ADD REPLYlink written 21 months ago by _r_am32k
2

I would consider putting the code on Github, rather than distributing it as a google link. People are often wary of downloading code from behind random links without first being able to inspect the source.

ADD REPLYlink written 21 months ago by Joe18k

Hi. Thanks. It is a good idea and I plan to do it a little later (as I haven't used Github yet).

But I must confess as nowadays the lib CBioInfCpp consists only one header file (as free source code) it is not so bad to use google drive too? Also there are 2 files - pdf and rtf - that contain the same description of the functions of CBioInfCpp in different formats (pdf and rtf). One may use any of them depending on preferable format.

ADD REPLYlink written 21 months ago by chernouhov sergey50
3

Well, look at it this way: I haven't clicked on your link yet, even though I trust you, because I don't know if it will take me to a page, or will start a download immediately. If it starts a download immediately, I don't know if I'm getting a zip file, naked source code, or something masquerading as either.

If you want to contribute to projects, or have people contribute to improving your code, github (and its friends) is absolutely the way to go. To get started with github you need only three commands really: git pull, git commit, git push. Everything else is a bonus ;)

There are plenty of good youtube tutorials etc to get you going.

ADD REPLYlink written 21 months ago by Joe18k

I'll see it.

But there are no zip or immediat downloads, it is a link to a directory

ADD REPLYlink modified 21 months ago • written 21 months ago by chernouhov sergey50

Sure, but its hard to tell that from the link alone, so people are unlikely to click it.

ADD REPLYlink written 21 months ago by Joe18k
1

I also don't think google drive is appropriate. Software in the days of google code, sourceforge etc was (and still in some cases is) far more poorly documented, intransparent, and unversioned. As a developer I think you'll enjoy github very much.

ADD REPLYlink written 21 months ago by colindaven2.6k
1

What jrj.healey said was the first thought to cross my mind. I'm not clicking on a google drive link. I really want to look at the code, the code structure and a README before I decide if something is worth a download.

ADD REPLYlink written 21 months ago by _r_am32k

As I see it, why do not to try to implement any tool, at least CBioInfCpp.h?

Maybe, there are any interesting problems for strings, graphs, etc?

As well why do not use for in/ out solving other tasks?

ADD REPLYlink written 20 months ago by chernouhov sergey50

Please don't add answers unless you are responding to the opening post. This is just a comment so I have moved it.

That said, I don't really understand your comment - what are you asking?

As I said before, you've already uploaded your code, if people find it, and want to use it, they will - there's nothing more to be done...

ADD REPLYlink written 20 months ago by Joe18k

It is my language troubles, I see.

I mean there may be some problems to solve and that it is interesting for me to solve such problems: both using this lib or no.

ADD REPLYlink modified 20 months ago • written 20 months ago by chernouhov sergey50
1
gravatar for chernouhov sergey
21 months ago by
Russian Federation
chernouhov sergey50 wrote:

added to GitHub as users ask for it: https://github.com/chernouhov/CBioInfCpp-0-

But: - GitHub is a new experience for me, so probably I DO some mistakes there. - Why only GitHub is the trusted place? We may be free but only there?

I Do declare that I DO NOT clearly understand all about GitHub so nowdays I use it only as a filehosting as it is so popular place.

ADD COMMENTlink modified 21 months ago • written 21 months ago by chernouhov sergey50

Github is now owned by Microsoft. I'd prefer it not to be owned by a big tech company, but that's life. Alternatives are gitlab. Nice one for making your (first?) steps into git, I doubt you'll regret it.

ADD REPLYlink written 20 months ago by colindaven2.6k
1
gravatar for chernouhov sergey
9 months ago by
Russian Federation
chernouhov sergey50 wrote:

22.04.2020

  • Modified function GenerateAlphabet for a single string.
  • Added group of function MakeSubgraphSetOfVertices to generate a subgraphs of a given graph (set by Adjacency vector) and a set/ unordered_set of vertices to be chosen.
ADD COMMENTlink written 9 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
19 months ago by
Russian Federation
chernouhov sergey50 wrote:

23/06/2019 update:

  • Group of function "FindIn" has been updated.
  • Functions PairVectorCout, PairVectorFout has been updated.
  • Group of function "GraphCout" and "GraphFout" has been added. So nowadays one may "cout/ fout" a graph that is set by Adjacency vector to screen/ to file line by line: one edge in one line.
  • Function "StrToCircular" added for finding the circular string of minimal length of the given one.
  • Group of function MaxFlowGraph" has been added to help find Maximal Flow, the paths of the maximal flow network and max-flow min-cut in a graph.
  • A data structure "Adjacency map" (a modification of data structure for containing graphs "Adjacency vector") has been added. Adjacency map allows to have quicker access to edge’s weight, but it can’t work with multiple edges.
  • Functions for converting Adjacency vector to Adjacency map and conversely AdjVectorToAdjMap and AdjMapToAdjVector have been added. Note that Multiple edges will be joined together.
  • Function TandemRepeatsFinding has been added. It is intended for finding tandem repeats in the given string that may be useful for solving problems related to Microsatellite Instability etc.
ADD COMMENTlink written 19 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
18 months ago by
Russian Federation
chernouhov sergey50 wrote:

14.07.2019 update:

  • Function CIGAR1 has been added.
  • Group of function "GraphCout" and "GraphFout" has been updated (so nowadays one may "cout/ fout" a graph that is set by both Adjacency vector and Adjacency map to screen/ to file line by line: one edge in one line).
  • Function EditDistA as an extended version of the function EditDist has been added (returns not only the value of Edit Distance between 2 strings but also one possible version of the alignment itself).
ADD COMMENTlink written 18 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
17 months ago by
Russian Federation
chernouhov sergey50 wrote:

09.08.2019 update:

  • Group of function "NBPaths" (for finding maximal non-branching paths in a graph, both weighted or no, directed or no) has been added.
  • Functions ConsStringQ1 and ConsStringQ2 for building consensus string upon a given collection of strings according to their quality has been added. Note that due to little data for testing errors may be found here (please notify if you found any).
ADD COMMENTlink modified 17 months ago • written 17 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
16 months ago by
Russian Federation
chernouhov sergey50 wrote:

31.08.2019 update:

  • Function GenRandomUWGraph that generates a random unweighted graph (as its "Adjacency vector") has been added.
  • Group of function intended to find collection of vertices for each strongly connected component of directed graph and to find collection of vertices for each connected component of undirected graph has been added.
  • Group of function for counting edges multiplicity of a graph that is set by Adjacency vector has been added.
ADD COMMENTlink modified 16 months ago • written 16 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
15 months ago by
Russian Federation
chernouhov sergey50 wrote:

19.10.2019:

  • Added group of functions AdjVectorToAdjMegaMap, AdjMegaMapToAdjVector to convert Adjacency vector to/ from Adjacency mega-map (i.e. extended version of Adjacency map to contain graphs having different multiply edges).

  • Updated Group of function GraphCout and GraphFout to deal with mega-maps.

ADD COMMENTlink written 15 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
14 months ago by
Russian Federation
chernouhov sergey50 wrote:

03.11.2019

  • Group of functions Num updated.
  • Function ScoreStringMatrix that counts score (i.e. total number of mismatches) upon vector a of strings s added.
  • Function GPPM that generates a position probability matrix (PPM) added. Note that pseudocounts may be used (the formula (Ns+z)/(N+2*z) is implemented).
ADD COMMENTlink written 14 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
14 months ago by
Russian Federation
chernouhov sergey50 wrote:

26.11.2019

  • For the functions ConsStringQ1 and ConsStringQ2 (intended for finding consesus string, in doing so quality may be taken into consideration or no) the default method is set = 1.
  • Function JoinOverlapStrings for joining overlapping strings has been added (in doing so, quality may be taken into consideration or no). So if we need to join collection 0->ACGT, 1->TGTA, 1->TT, 10->TT, 11->TCA in any way without any additional info,we should set NoQuality = true, Aggregate = false, and have the result: 0->ATGTA, 10->TTC.
  • Function ProfileProbableMer to find all most probable j-mers in a given string upon a given position probability matrix (PPM) has been added.
  • Function CycleToPath has been added.
ADD COMMENTlink modified 14 months ago • written 14 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
12 months ago by
Russian Federation
chernouhov sergey50 wrote:

11.01.2020

  • Added group of functions UPGMA_UndirectedGraph and NeighborJoiningUndirectedGraph for tree generating (as undirected graph) upon a given distance matrix.
ADD COMMENTlink written 12 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
10 months ago by
Russian Federation
chernouhov sergey50 wrote:

05.03.2020:

  • Added experimental functions for finding all cycles in a graph (Circles_in_Graph) and all find all paths between any two vertices in a directed graph (AllPathsDGraph).

06.03.2020:

  • Added function SubGraphsInscribed to solve the particular case of the problem of finding in a some graph A all subgraphs that are isomorphic to given graph B (can be found “inscribed” subgraphs only); The function may be also used to check if 2 graphs are isomorphic. This function can work with:  directed or undirected graphs,  graphs that have more than one connected components/ strongly connected components,  graphs that contain multiple edges. "Inscribed" means here that (1) this subgraph is "glued" to other parts of A only by edges that connected to its vertices that are begin/ end ones of any max-length non-branching path of this subgraph and/ or (2) graph A may have some other connected components. I.e. for graph B = {0->2, 10->2, 2->3, 3->4, 4->5, 4->6} we will find only A1 = {0->2, 1->2, 2->3, 3->4, 4->5, 4->6} as inscribed isomorphic subgraph of A = {0->2, 7->1, 1->2, 2->3, 3->4, 4->5, 4->6}. But if we add edge 3->8 to A (in this case A = {3->8, 0->2, 7->1, 1->2, 2->3, 3->4, 4->5, 4->6}), we couldn't find any inscribed isomorphic to B subgraph of A.

Preprint (in Russian) on this approach to solve (sub)graph isomorphism problem is here: dx.doi.org/10.24108/preprints-3111977

ADD COMMENTlink written 10 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
10 months ago by
Russian Federation
chernouhov sergey50 wrote:

29.03.2020

  • Added functions MedianString and GenerateAlphabet.
ADD COMMENTlink written 10 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
9 months ago by
Russian Federation
chernouhov sergey50 wrote:

some on isomorphic (sub)graph finding (examples and time estimating): On (sub)graph isomorphism

ADD COMMENTlink written 9 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
6 months ago by
Russian Federation
chernouhov sergey50 wrote:

10.07.2020

  • Function SuffixTreeMake (to make a suffix tree upon a string) and CoutSuffixTree & FoutSuffixTree (to out suffix tree to screen or to file) has been added. Suffix Tree will be contained in the vector of integers Tree, every edge as quartet of integers: number of the start-vertex of edge, number of end-vertex of edge, starting position of substring of the basic string, the length of this substring.
ADD COMMENTlink written 6 months ago by chernouhov sergey50
0
gravatar for chernouhov sergey
7 weeks ago by
Russian Federation
chernouhov sergey50 wrote:

05.12.2020

  • The extended experimental version of the function SubGraphsInscribed have been added. This extention/ modifacation is done by working with all edges of the input graphs instead of working with non-branching paths. If InscribedOnly == false the function finds all (not only inscribed) subgraphs of unweighted graph A that are isomorphic to unweighted graph B. If InscribedOnly == true the function looks for "inscribed" ones only.

Note1. Working time rather depends on input data. If A and B has much simular segments it will works very-very-very long. But if no - much faster. For example if they have a cycle with one edge having multiplicity = 3, etc.

Note 2 For undirected graphs function will works much slower

Here are test results for 05/12/2020: https://github.com/chernouhov/CBioInfCpp-0-/tree/master/TestsIsomorphicSubGraphsFinding

In particular, it found

  • for a directed graph B (15 vertices-20edged) - 4536 isomprphic subgraphs in directed graph A(250-350), ~ 3 sec,

  • for a directed graph B (25-35) - 82546 isomprphic subgraphs in directed graph A (2500-3500), ~ 1 min 40 sec

  • for an undirected graph B (15 vertices-20edged) - 69572 isomprphic subgraphs in undirected graph A(250-350), ~ 5 min

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by chernouhov sergey50
0
gravatar for chernouhov sergey
12 days ago by
Russian Federation
chernouhov sergey50 wrote:

12.01.2021

  • Function SubGraphsInscribedM - i.e. an experimental version of the function SubGraphsInscribed - has been added. SubGraphsInscribedM can find subgraphs in a given A that are isomorphic to a given template graph B too, but new is that vertices of these graphs may have marks. It may be useful for chemistry as one may associate an atom to some vertex (in case a molecule is set by graph).

13.01.2021

ADD COMMENTlink modified 12 days ago • written 12 days ago by chernouhov sergey50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2405 users visited in the last hour
_