BHOSLIB: Benchmarks with Hidden Optimum Solutions for Graph Problems (Maximum Clique, Maximum Independent Set, Minimum Vertex Cover and Vertex Coloring)

------ Hiding Exact Solutions in Random Graphs


If you have any comments or need more instances for the following benchmarks, please send me an email.


Maximum Independent Set (MIS) and Minimum Vertex Cover (MVC)

Given a graph G, an independent set is a subset S of vertices in G such that no two vertices in S are adjacent (connected by an edge). The maximum independent set problem is to find an independent set with the largest number of vertices in a given graph. This problem is NP-hard, and as such, it is considered unlikely that there exists an efficient algorithm for solving it. A vertex cover (or hitting set) is a subset S of vertices such that each edge of G has at least one of its endpoints in S. It is easy to see that given an independent set of a graph, all vertices not in the set form a vertex cover. The minimum vertex cover problem is to find a vertex cover with the smallest number of vertices. Finding challenging benchmarks for the maximum independent set problem (or equivalently, the minimum vertex cover  problem) is not only of significance for experimentally evaluating the algorithms of solving this problem but also of interest to the theoretical computer science community. The MIS and MVC benchmarks presented here are transformed from forced satisfiable SAT benchmarks of Model RB, with the set of vertices and the set of edges respectively corresponding to the set of variables and the set of binary clauses in SAT instances (Thanks to Klaus D. Witzel for bringing our attention to this transformation). In fact, based on Model RB and this transformation, we can propose a simple random graph model as follows:

  1. Generate n disjoint cliques, each of which has nα vertices (where α>0 is a constant);

  2. Randomly select two different cliques and then generate without repetitions pn2α random edges between these two cliques (where 0<p<1 is a constant);

  3. Run Step 2 (with repetitions) for another rnlnn-1 times (where r>0 is a constant).

It is easy to see that for the graph model above, the size of maximum independent sets is at most n. Interestingly, determining if such a upper bound can be reached is equivalent to determining the satisfiability of the corresponding CSP and SAT instances of Model RB, and there is a one-to-one mapping between the solutions of these two problems. To hide an independent set of size n in the instances of this graph model, we first select a vertex at random from each disjoint clique to form an independent set of size n, and then in the above process of generating random edges (Step 2), no edge is allowed to violate this maximum independent set. From the results of the paper "Many Hard Examples in Exact Phase Transitions with Application to Generating Hard Satisfiable Instances", it is expected that Model RB can also be used to generate hard instances for graph algorithms (by use of the above graph model and the hardness of phase transitions, i.e. choosing appropriate values of α, p and r). It is also hoped that the method of hiding solutions based on Model RB can have potential applications in cryptography. Recently, it has been shown (both experimentally and theoretically) that unlike random 3-SAT, it is quite natural and easy to hide solutions in random CSP and SAT instances of Model RB. For more details, please see an IJCAI-05 paper entitled "A Simple Model to Generate Hard Satisfiable Instances".

Note: Unless specified explicitly, all the benchmarks provided on this page are expressed in the ASCII DIMACS graph format.

A brief description of the ASCII DIMACS graph format:


Maximum Clique and Vertex Coloring

Given a graph G, a clique is a subset S of vertices in G such that each pair of vertices in S is connected by an edge. The maximum clique problem refers to the problem of  finding a clique with the largest number of vertices in a given graph. Since a clique is an independent set in the complementary graph, the maximum independent set problem and the maximum clique problem (which is one of the first shown to be NP-hard and has been extensively studied in graph theory and combinatorial optimization) are essentially equivalent. The following benchmarks are the complements of the graph instances above.

Note: For vertex coloring of the above instances, what is hard to solve is not to find the optimum colorings, but to prove the optimality of such solutions. For example, it should be very hard to prove that 59 is fewest number of colors to color the instance of frb59-26-1.clq.

Below are two complementary graph benchmarks generated using the same method and the same values of parameters (α, p and r) as above. The clique benchmark has a hidden maximum clique of size = 100 and, accordingly, the MIS benchmark has a hidden maximum independent set of size = 100. Based on theoretical analysis and experimental results of smaller instances, I conjecture that in the next 20 years or more (from 2005), these two benchmarks can not be solved on a PC (or alike) in a reasonable time (e.g. 1 day). Even so, it would be very interesting to see how large the gap is between the optimum value and what are obtained by various solvers. It is also of interest to see how this gap is going to be bridged with the development of algorithms and computers. Starting from these ideas, I hope that anyone of interest can have a try on solving the two benchmarks and, if possible, please inform me of the result you obtain (including the size of the largest clique or the largest independent set you find, how much time is used, on what kind of computer and by what solver). If agreed, your result will be posted as a news here and a link could also be added to your website (if there is one).

The DIMACS binary format, an alternative way for storing a graph, is space efficient and is used in many solvers. The above two benchmarks in this format are respectively available as frb100-40.clq.b (979KB) and frb100-40.mis.b (979KB).

News

List of Papers That Use Graph Benchmarks Based on Model RB (Updated: Mar. 4, 2009)


Other Benchmarks Based on Model RB


Forced Satisfiable CSP and SAT Benchmarks of Model RB

Pseudo-Boolean (0-1 Integer Programming) Benchmarks with Hidden Optimum Solutions

Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination

Weighted Max-2-SAT Benchmarks with Hidden Optimum Solutions (New!)
 


Back to Ke Xu's homepage.

Date Created: April 20, 2004. Last Updated: Mar. 20, 2010.