If you have any comments or need more instances for the following
benchmarks, please send me an email.
Given a graph G, an independent set is a subset S of vertices in G such that no two vertices in S are adjacent (connected by an edge). The maximum independent set problem is to find an independent set with the largest number of vertices in a given graph. This problem is NP-hard, and as such, it is considered unlikely that there exists an efficient algorithm for solving it. A vertex cover (or hitting set) is a subset S of vertices such that each edge of G has at least one of its endpoints in S. It is easy to see that given an independent set of a graph, all vertices not in the set form a vertex cover. The minimum vertex cover problem is to find a vertex cover with the smallest number of vertices. Finding challenging benchmarks for the maximum independent set problem (or equivalently, the minimum vertex cover problem) is not only of significance for experimentally evaluating the algorithms of solving this problem but also of interest to the theoretical computer science community. The MIS and MVC benchmarks presented here are transformed from forced satisfiable SAT benchmarks of Model RB, with the set of vertices and the set of edges respectively corresponding to the set of variables and the set of binary clauses in SAT instances (Thanks to Klaus D. Witzel for bringing our attention to this transformation). In fact, based on Model RB and this transformation, we can propose a simple random graph model as follows:
Generate n disjoint cliques, each of which has nα vertices (where α>0 is a constant);
Randomly select two different cliques and then generate without repetitions pn2α random edges between these two cliques (where 0<p<1 is a constant);
Run Step 2 (with repetitions) for another rnlnn-1 times (where r>0 is a constant).
It is easy to see that for the graph model above, the size of maximum independent sets is at most n. Interestingly, determining if such a upper bound can be reached is equivalent to determining the satisfiability of the corresponding CSP and SAT instances of Model RB, and there is a one-to-one mapping between the solutions of these two problems. To hide an independent set of size n in the instances of this graph model, we first select a vertex at random from each disjoint clique to form an independent set of size n, and then in the above process of generating random edges (Step 2), no edge is allowed to violate this maximum independent set. From the results of the paper "Many Hard Examples in Exact Phase Transitions with Application to Generating Hard Satisfiable Instances", it is expected that Model RB can also be used to generate hard instances for graph algorithms (by use of the above graph model and the hardness of phase transitions, i.e. choosing appropriate values of α, p and r). It is also hoped that the method of hiding solutions based on Model RB can have potential applications in cryptography. Recently, it has been shown (both experimentally and theoretically) that unlike random 3-SAT, it is quite natural and easy to hide solutions in random CSP and SAT instances of Model RB. For more details, please see an IJCAI-05 paper entitled "A Simple Model to Generate Hard Satisfiable Instances".
Note: Unless specified explicitly, all the benchmarks provided on this page are expressed in the ASCII DIMACS graph format.
A brief description of the ASCII DIMACS graph format:
There are 0 or more lines starting with c at the top of the file which are comment lines and can be ignored.
Following the comment lines, there is a line with the form "p edge V E" which specifies the size of the graph where V and E are the number of vertices and edges respectively.
The remaining of the file is a list of lines starting with e which indicate the edges in the graph (e.g. the line "e 1 3" indicates that there is an edge between vertex 1 and vertex 3).
Given a graph G, a clique is a subset S of vertices in G such that each pair of vertices in S is connected by an edge. The maximum clique problem refers to the problem of finding a clique with the largest number of vertices in a given graph. Since a clique is an independent set in the complementary graph, the maximum independent set problem and the maximum clique problem (which is one of the first shown to be NP-hard and has been extensively studied in graph theory and combinatorial optimization) are essentially equivalent. The following benchmarks are the complements of the graph instances above.
Note: For vertex coloring of the above instances, what is hard to solve is not to find the optimum colorings, but to prove the optimality of such solutions. For example, it should be very hard to prove that 59 is fewest number of colors to color the instance of frb59-26-1.clq.
Below are two complementary graph benchmarks generated using the same method and the same values of parameters (α, p and r) as above. The clique benchmark has a hidden maximum clique of size = 100 and, accordingly, the MIS benchmark has a hidden maximum independent set of size = 100. Based on theoretical analysis and experimental results of smaller instances, I conjecture that in the next 20 years or more (from 2005), these two benchmarks can not be solved on a PC (or alike) in a reasonable time (e.g. 1 day). Even so, it would be very interesting to see how large the gap is between the optimum value and what are obtained by various solvers. It is also of interest to see how this gap is going to be bridged with the development of algorithms and computers. Starting from these ideas, I hope that anyone of interest can have a try on solving the two benchmarks and, if possible, please inform me of the result you obtain (including the size of the largest clique or the largest independent set you find, how much time is used, on what kind of computer and by what solver). If agreed, your result will be posted as a news here and a link could also be added to your website (if there is one).
The DIMACS binary format, an alternative way for storing a graph, is space efficient and is used in many solvers. The above two benchmarks in this format are respectively available as frb100-40.clq.b (979KB) and frb100-40.mis.b (979KB).
News
Jul. 4, 2007: Using a stochastic local search solver (called COVER) on a 2.13GHz/2GB computer, Silvia Richter at Griffth University found a vertex cover of 3903 vertices (or equivalently, an independent set of 97 vertices) for the above MIS benchmark of 4000 vertices in 71.09 seconds. More specifically, the author ran the solver 100 times with different seeds. Of those 100 runs, 25 found a solution of size 3903, the other runs all found a solution of size 3904. Of the 25 successful runs that found the better solution, the quickest one did so in 71.09 seconds while the median run-time was 1193.92 seconds.
Jan.19, 2006: In 2005, according to weblog statistics, this webpage was visited 5147 times (by people from 54 countries) and the benchmarks on this page were downloaded 1472 times (including 169 times for the above two benchmarks of 4000 vertices). Most visitors were from USA, China, Canada, India, Germany, UK, France, Italy, Australia and Israel (about half of them came through Google).
Jul. 28, 2005: Using a local search SAT solver (called wsatcc) on a Pentium IV 3.4GHz/512MB computer, Lengning Liu at the University of Kentucky found a vertex cover of 3904 vertices (or equivalently, an independent set of 96 vertices) for the above MIS benchmark of 4000 vertices in 3743.16 seconds.
List of Papers That Use Graph Benchmarks Based on Model RB (Updated: Mar. 4, 2009)
Other Benchmarks Based on
Model RB
Forced
Satisfiable CSP and SAT Benchmarks of Model RB
Pseudo-Boolean (0-1 Integer Programming) Benchmarks with Hidden Optimum Solutions
Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination
Weighted Max-2-SAT Benchmarks with
Hidden Optimum Solutions (New!)
Date Created: April 20, 2004. Last Updated: Jul. 4, 2007.