Thursday, February 24, 2011

Improving Branching. Part. II: Polarity

In previous post I concluded that variable's polarity searched first affects the solving time of some instances (mostly of satisfiable instances of course). With this in mind I ran four tests with the branching heuristic modified as follows

  • initial - branch first on positive polarity
  • plus - branch first on polarity with the highest score
  • minus - branch first on polarity with the lowest score
  • random - branch first on a random polarity
The results are in the graph below. Needless to say that random rocks. plus and minus performed similarly. By choosing a random polarity STRUCTure can now solve three more instances from the AProVE class (I am developing negative emotional feelings towards them).

Improving Branching. Part. I: Coefficients

The current algorithm for choosing branching variable is as follows: compute scores for all literals, compute scores for each variable based on the score of its two polarities (i.e. positive literal and negative literal) and then choose a variable at random from those with highest scores.

The score for a literal is computed by propagating the literal and checking modified clauses (not including satisfying clauses). For every modified clause of length L and for each literal in that clause the score of the literal is increased with αL if clause is a disjunction of literals or βL if the clause is a XOR gate.

The score for a variable is computed using a formula from March_eq: S(b) = 1024 * S'(b) * S'(-b) + S'(b) + S'(-b) where S(b) is the score for variable b, S'(b) and S'(-b) are scores for literals b and -b.

The question is: what values should α and β take? To find out this I choose a training set made of a few instances and run the solver for a few hundreds different values of α and β. I got quite a few timeouts, but eventually I got many values to compute a interesting heat map (see below: yellow is bad, blue is good). From this I learned a few things: values for α under 0.25 are awful, β has little effects on the results (notice the vertical bars).

With the best found  values for α and β I ran the solver on a more extensive set of instances too see which instances were negatively affected and extended my initial set. This time I set the possible values for α in interval [0.5,1) though, retrospectively I think I shouldn't have.

Notice a new red vertical band around 0.8 which was dark blue in the previous heat map. Again, β had little effect. Notice that up to now I eliminated two ranges for α[0.00,0.25) and [0.75,1.00).

With the new found best values I run the solver on my usual set of instances (a subset of easy and medium instances from SAT09 Competition). The good news is that STRUCTure was able to solve a few more instances with some speedup on others. Many instances were not affected. The bad news is that one instance that was included in my training set timed out. The difference between training and testing was the number of CPU used (16 versus 8) which suggests that the solution was on a branch searched by one of the extra CPUs. Two things to remember: 1) Run the training set on fewer CPUs. 2) first polarity searched is relevant.

In conclusion I think that heat maps are a good way to figured out bad ranges for parameters but it is hard to find exceptional values.

Update: Here is another heat map which I used for SAT Competition 2011. This time I included the timeouts and the heatmap seems to provide a more clear picture of interaction of the two coefficients tested. The x-axis is α described above. On y-axis are the values for γ which defines for an implication u → v how much the v's score influences u's score. There are 4 regions distinguishable on the heat map from which good values for α and γ can be concluded: 0 ≤ α < 0.4 and 0 ≤ γ < 0.3.

Tuesday, February 22, 2011

Blocked Clause Elimination

I just finished implementing a technique Blocked Clause Elimination (BCE). The idea is that for a literal l and a clause C containing l if for every C' clause  R = C ⊗l C' is a tautology (where l is the resolution operator on literal l) the clause C can be eliminated. The removed clause C is said to be blocked on literal t. You can check Mate Soos' post on the same topic.

My implementation is pretty fast since it doesn't run BCE until completion, but it iterates over the literals and tries to find clauses blocked on that literal. It also doesn't handle dependent variables (i.e. variables that appear in exactly one XOR clause) or pure literals (literals that appear as a single phase). The graph below compares the performance BCE.

  • begin - before reasoning first time. Should speedup first node evaluation.
  • end - after reasoning first time (just before branch). Should find more blocked clauses.
  • begin_and_end - the previous two combined
  • always - at every node of search tree. Should discover new blocked clauses from propagating branching.
  • never - disabled

I expected that begin_and_end was at least as good as end, but it was not the case. It seems that doing BCE before first node evaluation slows down search on instances from the AProVE class. 

BCE is stronger than pure literals and dependent variables eliminations, but the later two eliminate most of the benefits of BCE. BCE after simplifications is able to find many clauses that are not already eliminated by other techniques and this is why end in the above graph shows the best performance.

Wednesday, February 16, 2011

XOR gates

I finally came about to extract XOR gates from cnf formulas. First thing I wanted to check how much of the literals come from XOR clauses on a few instances from SAT09 competition. The table below suggest that handing XOR gates is promising.

Test name
Literals in XOR gates
Literals in formula

Update: I finished implemented xor-gates and fixed most of the bugs. The change required to change the way I store clauses. Until now clauses were stored as: a b c ... 0. Now they are stored as (length,type) a b c ..., where length is the length of the clause, type is OR/XOR/NXOR and (length, type) means that length and type encoded together as a single integer. The most basic solver is working now, but except for equivalent literals renaming other simplification algorithms are not functional. The graph below shows the impact of handling XOR-gates.

The benefits of handing xor gates are multiple. First formulas are much shorter because xor gates requires exponential number of CNF clauses. This means that it is less costly to move instances around. Second the branch heuristic is less fooled by almost identical clauses. Third most other algorithms can safely ignore xor clauses to a bonus speedup. In conclusion I'm glad I spent the last few days refactoring STRUCTure.

Update: When I enabled all reasoning algorithms (e.g. hyper binary resolution, sub-summing, etc.) the performance difference changed dramatically. It leads me to believe that initial evaluations contained a bug. The graph below shoes the performance of STRUCTure behaves with and without xor gates. It seems that handling xor gates helps on a few instances, but the instances from AProvE class are better of without them.

Friday, February 11, 2011

Transitive closure

Given two implications a → b and b → c using the property of transitivity we can deduce another implication a → c. It is generally good to know many implications because they are used in binary and hyper binary resolutions. If implications are stored as a directed graph we can apply transitive closure on the graph to deduce many more implications. However, transitive closure is very expensive (O(|V|⋅(|V|+|E|)) time), memory hungry and not always gives the expected benefits because the implication might be already very dense.

My idea is to do the transitive closure when the graph is sparse and many new binaries can be found. There are two ways to compute the density of a directed graph:

  • log(|E|)/log(|V|) which returns a value in interval (-∞, 2)
  • 2*|E|/(|V|⋅(|V|-1)) which returns a value in interval [0, 2]

I chose the former, though I plan evaluate the later in the near feature. I tested a few values and figured out that 1.10 is the optimal threshold. If the graph is denser than 1.10 then the transitive closure is too expensive so it is omitted. The next figure shows the results for three thresholds 1.05 (blue), 1.10 (green) and 1.20 (red). Transitive closure has some benefits, but too much of it can slow down the search.

Wednesday, February 9, 2011


For my master project I'm writing STRUCTure, a distributed parallel sat solver. The project can be downloaded from github.

STRUCTure is build on top of Constellation which is a framework for distributed application. In Constellation you have activities that perform your tasks. Each activity can spawn and communicate with other activities provided that the identifiers are known. There is no distributed shared memory and an activity can be run on any participating machine. With this in mind let us see the design of STRUCTure.

STRUCTure is a DPLL type algorithm. It is not conflict-driven and doesn't learn new clauses. Let's say we have a SAT formula (usually in CNF), F. On F we can apply different reasonings such as

  • equivalent literals.
  • contradictory literals.
  • transitivity.
  • pure literals.
  • units propagation.
  • hyper binary resolution.
  • sub-summing.
  • self-sub-summing (currently one clause is a binary).

After the formula was simplified we have three options.
  • All literals were assigned and thus F is satisfiable.
  • A contradiction (e.g. F contains an empty clause) was found and thus F is unsatisfiable.
  • No solution was found and thus F is unknown.

If the formula F is unknown we can pick a literal a and branch on a, i.e. we try to solve children formula F- = {-a} ∧ F and F+ = {a} ∧ F.
  • if F- and F+ and unsatisfiable then F is unsatisfiable.
  • else if F- or F+ are satisfiable then F is satisfiable with solution given by the satisfied children.
  • else F is unknown (sometimes, e.g. maximum depth was reached, we can just return unknown without branching).
The performance of STRUCTure can be seen in the next image. The program was run on DAS-4 against SAT09 competition medium-level instances with a time limit of 600s on 8 cores.