tag:blogger.com,1999:blog-82881949868202492162014-09-18T12:06:52.417-07:00Abstract HeresiesUnorthodox opinions on computer science and programming.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.comBlogger399125tag:blogger.com,1999:blog-8288194986820249216.post-64223281510068971952014-09-18T12:06:00.000-07:002014-09-18T12:06:52.431-07:00A useful, if somewhat pointless, trick with homographic functionsIn my previous posts I showed that if you are applying a homographic function to a continued fraction, you can partly evaluate the answer before you completely apply the function. Instead of representing homographic functions as lambda expressions, I'll represent them as a list of the coefficients <code>a</code>, <code>b</code>, <code>c</code>, and <code>d</code> in <code>(lambda (t) (/ (+ (* a t) b) (+ (* c t) d)))</code>. I'll represent a simple continued fraction as a stream of the integer terms in the denominators.<br />
Here is how you partly apply a homographic function to a continued fraction:<br />
<pre>(define (partly-apply hf cf)
(let ((a (first hf))
(b (second hf))
(c (third hf))
(d (fourth hf)))
(if (empty-stream? cf)
(values (list a a
c c)
cf)
(let ((term (head cf)))
(values (list (+ (* a term) b) a
(+ (* c term) d) c)
(tail cf))))))</pre>Partly evaluating a homographic function involves looking at the limits of the function as <code>t</code> starts at 1 and goes to infinity:<br />
<pre>(define (partly-evaluate hf)
(let ((a (first hf))
(b (second hf))
(c (third hf))
(d (fourth hf)))
(if (and (same-sign? c (+ c d))
(let ((limit1 (quotient a c))
(limit2 (quotient (+ a b) (+ c d))))
(= limit1 limit2)))
(let ((term (quotient a c)))
(let ((new-c (- a (* c term)))
(new-d (- b (* d term))))
(values term (list c d new-c new-d))))
(values #f #f))))
</pre>We can combine these two steps and make something useful. For example, we can print the value of applying a homographic function to a continued fraction incrementally, printing the most significant digits before computing further digits.<br />
<pre>(define (print-hf-cf hf cf)
(call-with-values (lambda () (partly-evaluate hf))
(lambda (term hf*)
(if (not term)
(call-with-values (lambda () (partly-apply hf cf))
print-hf-cf)
(begin
(display term)
;; take reciprocal and multiply by 10
(let ((a (first hf*))
(b (second hf*))
(c (third hf*))
(d (fourth hf*)))
(print-hf-cf (list (* c 10) (* d 10)
a b)
cf)))))))</pre>But how often are you going to apply a homographic function to a continued fraction? Fortunately, the identity function is homographic (coefficients are 1 0 0 1), so applying it to a continued fraction doesn't change the value. The square root of 2 is a simple continued fraction with coefficients [1 2 2 2 ...] where the 2s repeat forever. We apply the identity homographic function to it and print the results:<br />
<pre>(printcf (list 1 0 0 1) sqrt-two)
14142135623730950488016887242096980785696718^G
; Quit!</pre>As you can see, we start printing the square root of two and we don't stop printing digits until the user interrupts.<br />
<br />
A fancier version could truncate the output and print a decimal point after the first iteration.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-86027107704602338222014-09-05T07:53:00.000-07:002014-09-05T07:53:49.776-07:00Another stupid homographic function trickIn my last post I showed that if you take a homographic function and apply it to a fraction, you can <em>partly</em> apply the function to the integer part of the fraction and get a new homographic function. The new function can be applied to the non-integer part of the fraction to generate an answer equivalent to the original function applied to the original fraction.<br />
<br />
It turns out that you can go in the other direction as well. You can <em>partly</em> evaluate a homographic function. For example, consider this homographic function:<br />
<pre>((lambda (t)
(/ (+ (* 70 t) 29)
(+ (* 12 t) 5))) n)</pre>Which we intend to apply to some positive number <code>n</code>. Even if all we know is that <code>n</code> is positive, we can deduce that the value of the homographic function is between 29/5 (when <code>t</code> is 0) and 70/12 (as <code>t</code> goes to infinity). The integer part of those values are the same, so we can factor that out:<br />
<pre>(+ 5 (/ 1 ((lambda (t)
(/ (+ (* 12 t) 5)
(+ (* 10 t) 4))) n)))</pre>The partial answer has an integer value of 5 and a fractional part that contains a new homographic function applied to our original <code>n</code>. We can do it again:<br />
<pre>(+ 5 (/ 1
(+ 1 (/ 1
((lambda (t)
(/ (+ (* 10 t) 4)
(+ (* 2 t) 1))) n)))))</pre>The fractional part of the answer can itself be factored into another integer and a new homographic function applied to our original <code>n</code>.<br />
<br />
<a name='more'></a>A <em>generalized continued fraction</em> is a number of the form:<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-u2oG7oTJhts/VAYioSZ8hII/AAAAAAAAIWM/EiZwhAetX90/s1600/continuedf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-u2oG7oTJhts/VAYioSZ8hII/AAAAAAAAIWM/EiZwhAetX90/s1600/continuedf.png" /></a></div>If all the <em>b<sub>i</sub></em> are 1, then it is a <em>simple</em> continued fraction. You can turn generalized continued fractions into a simple continued fraction by doing the algebra.<br />
<br />
What happens if you partly apply a homographic function to a continued fraction? The algebra is tedious, but here's what happens:<br />
<pre>((lambda (t)
(/ (+ (* 2 t) 1)
(+ (* 1 t) 3))) (+ 3 (/ 1 (+ 7 (/ 1 16)))))
;; after one step
((lambda (t)
(/ (+ (* 7 t) 2)
(+ (* 6 t) 1))) (+ 7 (/ 1 16)))
;; after two steps
((lambda (t)
(/ (+ (* 51 t) 7)
(+ (* 43 t) 6))) 16)
</pre>By partly apply a homographic function to a continued fraction, we can process the integer part separately and before the fractional part. By partly evaluating the application of a homographic function, we can often determine the integer part without fully evaluating the argument to the function. For example, after step one above, we could instead partially evaluate the application:<br />
<pre>;; after one step
((lambda (t)
(/ (+ (* 7 t) 2)
(+ (* 6 t) 1))) (+ 7 (/ 1 16)))
;; Alternatively, partially evaluate first term
(+ 1 (/ 1
((lambda (t)
(/ (+ (* 6 t) 1)
(+ (* 1 t) 1))) (+ 7 (/ 1 16)))))
</pre>Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-56868209233937256862014-09-03T10:32:00.000-07:002014-09-03T10:32:03.820-07:00Stupid homographic function trickA function of the form<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-Rcot2E2rBz4/VAYCfooXuNI/AAAAAAAAIV8/yHOz_V6D6mU/s1600/homographic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-Rcot2E2rBz4/VAYCfooXuNI/AAAAAAAAIV8/yHOz_V6D6mU/s1600/homographic.png" /></a></div><div>is called a <i>homographic function</i>. Here is one in Scheme:<br />
<pre>(lambda (t)
(/ (+ (* 3 t) 4)
(+ (* 5 t) 2)))</pre>And here is what it's graph looks like:<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-T310zUsYfFc/VAcudW5XENI/AAAAAAAAIYY/Mg7033nexVE/s1600/homograph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-T310zUsYfFc/VAcudW5XENI/AAAAAAAAIYY/Mg7033nexVE/s1600/homograph.png" /></a></div>If you multiply all the coefficients (<code>a</code>, <code>b</code>, <code>c</code>, and <code>d</code>) by the same number, it doesn't change the function. For instance, this homographic function:<br />
<pre>(lambda (t)
(/ (+ (* 21 t) 28)
(+ (* 35 t) 14)))</pre>is the same as the one above. If one of your coefficients isn't an integer, don't worry, you can multiply everything by the denominator and get an equivalent homographic function. On the other hand, you can divide all your coefficients by their greatest common divisor and get an equivalent homographic function with smaller coefficients. We'll keep our homographic functions in smallest integer form.<br />
<br />
A rational number can be written as the sum of an integer and a fraction less than one. For example, 23/5 = 4 + 3/5.<br />
<br />
Let's apply a homographic function to a rational number:<br />
<pre>((lambda (t)
(/ (+ (* a t) b)
(+ (* c t) d))) (+ x y/z))
;; substitute
(/ (+ (* a (+ x y/z)) b)
(+ (* c (+ x y/z)) d))
;; distribute the multiplication
(/ (+ (* a x) (* a y/z) b)
(+ (* c x) (* c y/z) d))
;; multiply top and bottom by z/y
(/ (* z/y (+ (* a x) (* a y/z) b))
(* z/y (+ (* c x) (* c y/z) d)))
;; distribute the multiplication
(/ (+ (* a x z/y) (* a y/z z/y) (* b z/y))
(+ (* c x z/y) (* c y/z z/y) (* d z/y)))
;; simplify
(/ (+ (* a x z/y) a (* b z/y))
(+ (* c x z/y) c (* d z/y)))
;; rearrange terms
(/ (+ (* a x z/y) (* b z/y) a)
(+ (* c x z/y) (* d z/y) c))
;; factor out z/y
(/ (+ (* (+ (* a x) b) z/y) a)
(+ (* (+ (* c x) d) z/y) c))
</pre>Now we do something tricky. We abstract out the <code>z/y</code> term:<br />
<pre>((lambda (t)
(/ (+ (* (+ (* a x) b) t) a)
(+ (* (+ (* c x) d) t) c))) (/ z y))
</pre>If introduce a <code>let</code> form, we can see something interesting:<br />
<pre>((lambda (t)
(let ((a1 (+ (* a x) b))
(b1 a)
(c1 (+ (* c x) d))
(d1 c))
(/ (+ (* a1 t) b1)
(+ (* c1 t) d1)))) (/ z y))
</pre>We find a new homographic function being applied to a new rational number. The new homographic function has coefficients related to the original one, and the new rational number is the reciprocal of the fractional part of the original rational number. So if we have a homographic function <code>hf</code> applied to a fraction of the form <code>x + y/z</code>, we can easily find a new homographic function <code>hf'</code> that when applied to <code>z/y</code> will produce the same answer as the original. Applying a homographic function to a fraction has the effect of "eating" the integer part of the fraction, which generates a new homographic function that is applied to the reciprocal of the fractional part.</div><br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-39786235858640144032014-08-27T10:30:00.000-07:002014-08-28T17:06:37.232-07:00A use of Newton's methodI've seen more than one book claim that computing with real numbers inevitably involves round-off errors because real numbers can have an infinite number of digits after the decimal point and no finite representation can hold them. This is false. Instead of representing a real number as a nearby rational number with an error introduced by rounding, we'll represent a real number as computer program that generates the digits. The number of digits generated is potentially infinite, but the program that generates them is definitely finite.<br />
<br />
Here is Gosper's algorithm for computing the square root of a rational number.<br />
<pre>(define (gosper-sqrt a b c d)
;; Solve for
;; ax + b
;; ------ = x
;; cx + d
(define (newtons-method f f-prime guess)
(let ((dy (f guess)))
(if (< (abs dy) 1)
guess
(let ((dy/dx (f-prime guess)))
(newtons-method f f-prime (- guess (/ dy dy/dx)))))))
(define (f x)
(+ (* c x x)
(* (- d a) x)
(- b)))
(define (f-prime x)
(+ (* 2 c x)
(- d a)))
(let ((value (floor (newtons-method f f-prime b))))
(cons-stream value
(gosper-sqrt (+ (* c value) d)
c
(+ (* (- a (* value c)) value)
(- b (* value d)))
(- a (* value c))))))
1 ]=> (cf:render (gosper-sqrt 0 17 10 0))
1.303840481040529742916594311485836883305618755782013091790079369...
;; base 10, 100 digits
1 ]=> (cf:render (gosper-sqrt 0 17 10 0) 10 100)
1.303840481040529742916594311485836883305618755782013091790079369896765385576397896545183528886788497...
</pre>
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-16895801407690061192014-08-26T13:33:00.000-07:002014-08-26T13:33:16.234-07:00Solutions in search of problemsSuppose you have a function like <code>(define foo (lambda (x) (- (* x x x) 30)))</code> and you want to find <code>x</code> such that <code>(foo x)</code> = <code>0</code>. There are a few ways to go about this. If you can find two different <code>x</code> such that <code>(foo x)</code> is positive for one and negative for the other, then <code>(foo x)</code> must be zero somewhere in between. A simple binary search will find it.<br />
<pre>(define (bisection-method f left right)
(let* ((midpoint (average left right))
(fmid (f midpoint)))
(if (< (abs fmid) 1e-8)
midpoint
(let ((fl (f left))
(fr (f right)))
(cond ((same-sign? fl fr) (error "Left and right not on opposite sides."))
((same-sign? fmid fr) (bisection-method f left midpoint))
((same-sign? fl fmid) (bisection-method f midpoint right))
(else (error "shouldn't happen")))))))
(define (average l r) (/ (+ l r) 2))
(define (same-sign? l r)
(or (and (positive? l)
(positive? r))
(and (negative? l)
(negative? r))))
1 ]=> (cos 2)
;Value: -.4161468365471424
1 ]=> (cos 1)
;Value: .5403023058681398
1 ]=> (bisection-method cos 1.0 2.0)
1. 2.
1.5 2.
1.5 1.75
1.5 1.625
1.5625 1.625
1.5625 1.59375
1.5625 1.578125
1.5703125 1.578125
1.5703125 1.57421875
1.5703125 1.572265625
1.5703125 1.5712890625
1.5703125 1.57080078125
1.570556640625 1.57080078125
1.5706787109375 1.57080078125
1.57073974609375 1.57080078125
1.570770263671875 1.57080078125
1.5707855224609375 1.57080078125
1.5707931518554687 1.57080078125
1.5707931518554687 1.5707969665527344
1.5707950592041016 1.5707969665527344
1.570796012878418 1.5707969665527344
1.570796012878418 1.5707964897155762
1.570796251296997 1.5707964897155762
1.570796251296997 1.5707963705062866
1.5707963109016418 1.5707963705062866
1.5707963109016418 1.5707963407039642
;Value: 1.570796325802803</pre>Rather than selecting the midpoint between the two prior guesses, you can pretend that your function is linear between the guesses and interpolate where the zero should be. This can converge quicker.<br />
<pre>(define (secant-method f x1 x2)
(display x1) (display " ") (display x2) (newline)
(let ((f1 (f x1))
(f2 (f x2)))
(if (< (abs f1) 1e-8)
x1
(let ((x0 (/ (- (* x2 f1) (* x1 f2))
(- f1 f2))))
(secant-method f x0 x1)))))
1 ]=> (secant-method cos 0.0 4.0)
0. 4.
2.418900874126076 0.
1.38220688493168 2.418900874126076
1.5895160570280047 1.38220688493168
1.5706960159120333 1.5895160570280047
1.5707963326223677 1.5706960159120333
;Value: 1.5707963326223677
</pre>If you know the derivative of <code>f</code>, then you can use Newton's method to find the solution.<br />
<pre>(define (newtons-method f f-prime guess)
(display guess) (display " ") (newline)
(let ((dy (f guess)))
(if (< (abs dy) 1e-8)
guess
(let ((dy/dx (f-prime guess)))
(newtons-method f f-prime (- guess (/ dy dy/dx)))))))
1 ]=> (newtons-method cos (lambda (x) (- (sin x))) 2.0)
2.
1.5423424456397141
1.5708040082580965
1.5707963267948966
;Value: 1.5707963267948966</pre>Here's another example. We'll find the cube root of 30 by solving <code>(lambda (x) (- (* x x x) 30))</code>.<br />
<pre>(define (cube-minus-thirty x) (- (* x x x) 30))
1 ]=> (bisection-method cube-minus-thirty 0.0 4.0)
0. 4.
2. 4.
3. 4.
3. 3.5
3. 3.25
3. 3.125
3.0625 3.125
3.09375 3.125
3.09375 3.109375
3.1015625 3.109375
3.10546875 3.109375
3.10546875 3.107421875
3.1064453125 3.107421875
3.10693359375 3.107421875
3.107177734375 3.107421875
3.107177734375 3.1072998046875
3.107177734375 3.10723876953125
3.107208251953125 3.10723876953125
3.1072235107421875 3.10723876953125
3.1072311401367187 3.10723876953125
3.1072311401367187 3.1072349548339844
3.1072311401367187 3.1072330474853516
3.107232093811035 3.1072330474853516
3.107232093811035 3.1072325706481934
3.1072323322296143 3.1072325706481934
3.107232451438904 3.1072325706481934
3.107232451438904 3.1072325110435486
3.107232481241226 3.1072325110435486
3.1072324961423874 3.1072325110435486
3.107232503592968 3.1072325110435486
3.107232503592968 3.1072325073182583
3.107232505455613 3.1072325073182583
3.107232505455613 3.1072325063869357
;Value: 3.1072325059212744
1 ]=> (secant-method cube-minus-thirty 0.0 4.0)
0. 4.
1.875 0.
8.533333333333333 1.875
2.1285182547245376 8.533333333333333
2.341649751209406 2.1285182547245376
3.4857887202177547 2.341649751209406
3.0068542655016235 3.4857887202177547
3.0957153766467633 3.0068542655016235
3.1076136741672546 3.0957153766467633
3.1072310897513415 3.1076136741672546
3.1072325057801455 3.1072310897513415
;Value: 3.1072325057801455
1 ]=> (define (cube-minus-thirty-prime x) (* 3 x x))
1 ]=> (newtons-method cube-minus-thirty cube-minus-thirty-prime 4.0)
4.
3.2916666666666665
3.1173734622300557
3.10726545916981
3.1072325063033337
3.107232505953859
;Value: 3.107232505953859</pre><br />
<br />
<br />
<br />
<br />
<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-90508265448404191292014-08-22T15:57:00.000-07:002014-08-22T15:57:14.779-07:00Small puzzle solutionBefore I give my solution, I'd like to describe the leftmost digit algorithm in a bit more detail.<br />
<pre>(define (leftmost-digit base n)
(if (< n base)
n
(let ((leftmost-pair (leftmost-digit (* base base) n)))
(if (< leftmost-pair base)
leftmost-pair
(quotient leftmost-pair base)))))</pre>The idea is this: if we have a one digit number, we just return it, otherwise we recursively call <code>leftmost-digit</code> with the <em>square</em> of the base. Squaring the base will mean gobbling up pairs of digits in the recursive call, so we'll get back either a one or two digit answer from the recursion. If it is one digit, we return it, otherwise it's two digits and we divide by the base to get the left one.<br />
<br />
For example, if our number is 12345678 and the base is 10, we'll make a recursive call with base 100. The recursive call will deal with number as if it were written <code>12 34 56 78</code> in base 100 and return the answer 12. Then we'll divide that by 10 to get the 1.<br />
<br />
Since we're squaring the base, we're doubling the number of digits we're dealing with on each recursive call. This leads to the solution in O(log log n) time. If we instrument <code>quotient</code>, you can see:<br />
<pre>(leftmost-digit 10 816305093398751331727331379663195459013258742431006753294691576)
816305093398751331727331379663195459013258742431006753294691576 / 100000000000000000000000000000000
8163050933987513317273313796631 / 10000000000000000
816305093398751 / 100000000
8163050 / 10000
816 / 100</pre>A sixty-three digit number trimmed down to one digit with only five divisions.<br />
<br />
So a simple solution to the puzzle is:<br />
<pre>(define (leftmost-digit+ base n)
(if (< n base)
(values n 0)
(call-with-values (lambda () (leftmost-digit+ (* base base) n))
(lambda (leftmost-pair count)
(if (< leftmost-pair base)
(values leftmost-pair (* count 2))
(values (quotient leftmost-pair base) (+ (* count 2) 1)))))))
</pre>The second value is the count of how many digits we discard. If the number is less than the base, we return it and we discarded nothing. Otherwise, we make the recursive call with the base squared and get back two values, the leftmost pair and the number of pairs it discarded. If the leftmost pair is a single digit, we return it, otherwise we divide by the base. The number of digits discarded is simply twice the number discarded by the recursive call, plus one more if we had to divide.<br />
<br />
But I don't see an easy way to separate finding the digit from finding the position. At first it seemed straightforward to just count the digits being discarded, but you can't decide whether to increment the count at each stage without determining if the leftmost part of the recursive call contains one or two digits.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-82430748540620772202014-08-21T16:43:00.000-07:002014-08-21T16:43:14.348-07:00Just a small puzzleYou can get the most significant digit (the leftmost) of a number pretty quickly this way<br />
<pre>(define (leftmost-digit base n)
(if (< n base)
n
(let ((leftmost-pair (leftmost-digit (* base base) n)))
(if (< leftmost-pair base)
leftmost-pair
(quotient leftmost-pair base)))))</pre>The puzzle is to adapt this code to return the position of the leftmost digit.<br />
<br />
<pre>(leftmost-digit+ 10 46729885) would return two values, 4 and 7</pre>Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-82910414973289460562014-08-08T14:56:00.001-07:002014-08-08T14:56:44.932-07:00Mini regex golf 3: set coverI'm computing the set cover by incrementally adding items to be covered. Naturally, the order in which you add items changes the way the program progresses. I added code that picks an item to be added each iteration rather than just pulling the <code>car</code> off the front of a list.<br />
<pre>(define (cover8 value->keys-table better-solution)
(define (add-v-k-entry solution-set v-k-entry)
(let ((value (car v-k-entry))
(keys (cdr v-k-entry)))
(write-string "Adding value ") (write value) (newline)
(write-string " with keys ") (write keys) (newline)
(write-string " to ") (write (length solution-set))
(write-string " partial solutions.") (newline)
(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))
(let ((trimmed-solutions
(trim-partial-solutions value->keys-table new-solutions)))
(write-string "Returning ") (write (length trimmed-solutions))
(write-string " of ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)
trimmed-solutions))))
(define (cover v-k-entries)
(cond ((pair? v-k-entries)
(pick-v-k-entry value->keys-table v-k-entries
(lambda (selected remaining)
(add-v-k-entry (cover remaining) selected))))
((null? v-k-entries)
(list '()))
(else (improper-list-error 'cover v-k-entries))))
(let ((minimized (minimize-vktable value->keys-table better-solution)))
(least-elements (cover minimized) better-solution)))
(define (pick-v-k-entry value->keys-table v-k-entries receiver)
(define (score v-k-entry)
(let* ((matched-all
(count-matching-items value->keys-table
(lambda (other)
(there-exists? (cdr v-k-entry)
(lambda (key) (member key (cdr other)))))))
(matched-remaining
(count-matching-items v-k-entries
(lambda (other)
(there-exists? (cdr v-k-entry)
(lambda (key) (member key (cdr other)))))))
(matched-forward (- matched-all matched-remaining)))
(cons matched-remaining matched-forward)))
(let ((scored (map (lambda (v-k-entry) (cons (score v-k-entry) v-k-entry))
v-k-entries)))
(let ((picked
(cdar
(least-elements scored
(lambda (left right)
(let* ((len-l (length (cdr left)))
(len-r (length (cdr right)))
(lmr (caar left))
(lmf (cdar left))
(rmr (caar right))
(rmf (cdar right)))
(or (> len-l len-r)
(and (= len-l len-r)
(or (> lmf rmf)
(and (= lmf rmf)
(< lmr rmr)))))
))))))
(display "Picking ") (write picked) (newline)
(receiver picked (delete picked v-k-entries)))))
(define (trim-partial-solutions value->keys-table partial-solutions)
(let ((equivalent-solutions
(map (lambda (entry) (cons (cdr entry) (car entry)))
(collect-equivalent-partial-solutions value->keys-table
partial-solutions))))
(write-string " Deleting ")
(write (- (length partial-solutions) (length equivalent-solutions)))
(write-string " equivalent partial solutions.")
(newline)
(remove-dominated-solutions value->keys-table
(map lowest-scoring-equivalent-partial-solution
equivalent-solutions))))
</pre>Finally, it turns out that computing dominating partial solutions is expensive, so I changed the set operations to use a bitmap representation:<pre>(define (remove-dominated-solutions value->keys-table partial-solutions)
(let ((before-length (length partial-solutions))
(all-values (get-values value->keys-table)))
(let ((table
;; put the long ones in first
(sort
(map (lambda (partial-solution)
(cons partial-solution
(lset->bset all-values
(map car (partial-solution-matches value->keys-table
partial-solution)))))
partial-solutions)
(lambda (left right)
(> (length (bset->lset all-values (cdr left)))
(length (bset->lset all-values (cdr right))))))))
(let ((answer (map car
(fold-left (lambda (answer solution)
(if (there-exists? answer
(dominates-solution? solution))
answer
(cons solution answer)))
'()
table))))
(let ((after-length (length answer)))
(write-string " Removing ") (write (- before-length after-length))
(write-string " dominated solutions.")
(newline)
answer)))))
(define (dominates-solution? solution)
(let* ((partial-solution (car solution))
(partial-solution-score (score partial-solution))
(solution-matches-raw (cdr solution)))
(lambda (other-solution)
(let* ((other-partial-solution (car other-solution))
(other-matches-raw (cdr other-solution)))
(and
(bset-superset? other-matches-raw solution-matches-raw)
(<= (score other-partial-solution) partial-solution-score))))))
(define (get-values v-k-table)
(fold-left (lambda (answer entry) (lset-adjoin equal? answer (car entry)))
'()
v-k-table))
(define (bset-element->bit universe element)
(cond ((null? element) 0)
(else (expt 2 (list-index (lambda (item) (eq? item element)) universe)))))
(define (bset-adjoin universe bset element)
(bset-union bset (bset-element->bit universe element)))
(define (lset->bset universe lset)
(fold-left (lambda (answer element)
(bset-adjoin universe answer element))
0
lset))
(define (bset->lset universe bset)
(cond ((zero? bset) '())
((even? bset) (bset->lset (cdr universe) (/ bset 2)))
(else (cons (car universe) (bset->lset (cdr universe) (/ (- bset 1) 2))))))
(define (bset-union left right) (bitwise-ior left right))
(define (bset-superset? bigger smaller)
;; Is every element of smaller in bigger?
(zero? (bitwise-andc2 smaller bigger)))
</pre>This code can now find the shortest regular expression consisting of letters and dots (and ^$) that matches one set of strings but not another.<br />
<br />
Depending on the strings, this can take quite a bit of time to run. Dotted expressions cause a combinatorical explosion in matching regexps (or substrings), but what makes it worse is that the dotted expressions tend to span different sets of strings. If two different dotted expressions, each with different matching sets of strings, appear in a single string, then the number of partial solutions will be multiplied by two as we try each different dotted expression.<br />
<br />
It is characteristic of NP problems that it is easy to determine if you have a good solution, but quite hard to find it among a huge number of other, poor solutions. This problem exhibits this characteristic, but there is a bit more structure in the problem that we are exploiting. The word lists are drawn from the English language. This makes some bigrams, trigrams, etc. far, far, more likely to appear than others.<br />
<br />
Short words are much easier to process than longer ones because they simply contain fewer things to match. On the other hand, longer words tend to be dominated by shorter ones anyway.<br />
<br />
To be continued...Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-5373700746004630732014-08-07T09:43:00.001-07:002014-08-07T09:43:43.533-07:00Mini regex golf 2: adding regular expressionsIt wasn't too hard to add regular expressions to the substring version. What took a while was just tinkering with the code, breaking it, fixing it again, noticing an optimization, tinkering, etc. etc. In any case it works and here is some of it.<br />
<pre>(define (make-extended-ngram-table winners losers)
(let* ((initial-ngrams (generate-ngrams winners losers)))
(write-string "Initial ngrams: ") (write (length initial-ngrams))
(newline)
(map (lambda (winner)
(cons winner
(keep-matching-items initial-ngrams
(lambda (ngram) (re-string-search-forward ngram winner)))))
winners)))
(define (generate-ngrams winners losers)
(write-string "Generating ngrams...")(newline)
(let ((losing-ngram? (string-list-matcher losers)))
(fold-left (lambda (answer winner)
(lset-union equal? answer (extended-ngrams losing-ngram? winner)))
'()
winners)))
(define (string-list-matcher string-list)
(lambda (test-ngram)
(there-exists? string-list
(lambda (string)
(re-string-search-forward test-ngram string)))))
(define *dotification-limit* 4)
(define *generate-ends-of-words* #t)
(define *generate-dotted* #t)
(define (ngrams-of-length n string)
(do ((start 0 (1+ start))
(end n (1+ end))
(answer '() (lset-adjoin string=? answer (substring string start end))))
((> end (string-length string)) answer)))
(define (generate-dotted answer losing-ngram?)
(do ((tail answer (cdr tail))
(answer '() (let ((item (car tail)))
(fold-left (lambda (answer dotted)
(if (losing-ngram? dotted)
answer
(lset-adjoin string=? answer dotted)))
answer
(dotify item)))))
((not (pair? tail))
(if (null? tail)
answer
(improper-list-error 'generate-dotted tail)))))
(define (dotify word)
(cond ((string=? word "") (list ""))
((> (string-length word) *dotification-limit*) (list word))
(else
(fold-left (lambda (answer dotified)
(fold-left (lambda (answer replacement)
(lset-adjoin equal? answer
(string-append replacement dotified)))
answer
(replacements (substring word 0 1))))
'()
(dotify (substring word 1 (string-length word)))))))
(define (replacements string)
(if (or (string=? string "^")
(string=? string "$"))
(list string)
(list string ".")))
(define (extended-ngrams losing-ngram? string)
(let ((string (if *generate-ends-of-words*
(string-append "^" string "$")
string)))
(do ((n 1 (+ n 1))
(answer '() (lset-union
string=? answer
(delete-matching-items (ngrams-of-length n string)
losing-ngram?))))
((> n (string-length string))
(if *generate-dotted*
(generate-dotted answer losing-ngram?)
answer)))))</pre>Adding the dotification greatly increases the number of ways to match words:<br />
<pre>1 ]=> (extended-ngrams (string-list-matcher losers) "lincoln")
;Value 15: ("li" "ln" "ln$" "oln" ".ln" "col" "lin" "li." "^li" "o.n$" "oln$" ".ln$" "col." "c.ln" "..ln" "coln" ".oln" "co.n" "n.ol" "..ol" "ncol" ".col" "nc.l" "i.co" "inco" "i..o" "in.o" "lin." "li.." "l.nc" "linc" "l..c" "li.c" "^li." "^lin" "coln$" "ncoln" "incol" "linco" "^linc" "ncoln$" "incoln" "lincol" "^linco" "incoln$" "lincoln" "^lincol" "lincoln$" "^lincoln" "^lincoln$")</pre>The table that maps words to their extended ngrams is quite large, but it can be reduced in size without affecting the solution to the set cover problem. If two regexps match exactly the same set of winning strings, then one can be substituted for the other in any solution, so we can discard all but the shortest of these. If a regexp matches a proper superset of another regexp, and the other regexp is at least the same length or longer, then the first regexp dominates the second one, so we can discard the second one.<br />
<pre>(define (minimize-keys value->keys-table better-solution)
(let* ((all-keys (get-keys value->keys-table))
(equivalents (collect-equivalent-partial-solutions value->keys-table
(map list all-keys)))
(reduced (map (lambda (equivalent)
(cons (car equivalent)
(car (least-elements (cdr equivalent)
better-solution))))
equivalents))
(dominants (collect-dominant-partial-solutions reduced better-solution))
(good-keys (fold-left (lambda (answer candidate)
(lset-adjoin equal? answer (cadr candidate)))
'()
dominants)))
(define (rebuild-entry entry)
(cons (car entry) (keep-matching-items (cdr entry)
(lambda (item) (member item good-keys)))))
(write-string "Deleting ") (write (- (length all-keys) (length good-keys)))
(write-string " of ") (write (length all-keys)) (write-string " keys. ")
(write (length good-keys)) (write-string " keys remain.")(newline)
(map rebuild-entry value->keys-table)))
(define (partial-solution-matches value->keys-table partial-solution)
(keep-matching-items
value->keys-table
(lambda (entry)
(there-exists? partial-solution (lambda (key) (member key (cdr entry)))))))
(define (collect-equivalent-partial-solutions value->keys-table partial-solutions)
(let ((answer-table (make-equal-hash-table)))
(for-each (lambda (partial-solution)
(hash-table/modify! answer-table
(map car (partial-solution-matches
value->keys-table
partial-solution))
(list)
(lambda (other)
(lset-adjoin equal? other partial-solution))))
partial-solutions)
(hash-table->alist answer-table)))
(define (collect-dominant-partial-solutions equivalents better-solution)
(define (dominates? left right)
(and (superset? (car left) (car right))
(not (better-solution (cdr right) (cdr left)))))
(let ((sorted (sort equivalents
(lambda (l r) (> (length (car l)) (length (car r)))))))
(fold-left (lambda (answer candidate)
(if (there-exists? answer (lambda (a) (dominates? a candidate)))
answer
(lset-adjoin equal? answer candidate)))
'()
sorted)))
</pre>We can minimize the value->key-table in another way. If two values in the table are matched by the exact same set of keys, then we can delete one without changing the solution. If a value is matched by a small set of keys, and if another values is matched by a superset of these keys, then we can delete the larger one because if the smaller one matches, the larger one must match as well.<pre>(define (minimize-values v-k-table)
(let ((size-before (length v-k-table)))
(define (dominated-value? entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? v-k-table
(lambda (other-entry)
(and (not (eq? entry other-entry))
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(let ((result (and (superset? entry-keylist other-keylist)
(not (superset? other-keylist entry-keylist)))))
(if result
(begin (display "Removing ")
(write entry-value)
(display " dominated by ")
(write other-value)
(display ".")
(newline)
))
result)))))))
(define (equivalent-value-in-answer? answer entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? answer
(lambda (other-entry)
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(let ((result (equal? entry-keylist other-keylist)))
(if result
(begin (display "Removing ")
(write entry-value)
(display " equivalent to ")
(write other-value)
(display ".")
(newline)
))
result))))))
(define (add-entry answer entry)
(if (or (equivalent-value-in-answer? answer entry)
(dominated-value? entry))
answer
(cons entry answer)))
(let ((answer (fold-left add-entry '() v-k-table)))
(write-string "Removed ") (write (- size-before (length answer)))
(write-string " dominated and equivalent values.")
(newline)
answer)))</pre>Each time we remove values or keys, we might make more keys and values equivalent or dominated, so we iterate until we can no longer remove anything.<pre>(define (minimize-vktable value->keys-table better-solution)
(let* ((before-size (fold-left + 0 (map length value->keys-table)))
(new-table
(minimize-values
(minimize-keys value->keys-table better-solution)))
(after-size (fold-left + 0 (map length new-table))))
(if (= before-size after-size)
value->keys-table
(minimize-vktable new-table better-solution))))</pre>The minimized table for the presidents looks like this:<br />
<pre>(("washington" "sh" "g..n" "n..o" ".h.n" "a..i")
("adams" "a.a" "am" "ad")
("madison" "m..i" "i..n" "is." "i.o" "di" "ma" "ad")
("monroe" "r.e$" "oe")
("van-buren" "u..n" "r.n" ".b" "bu" "-")
("harrison" "r..s" "r.i" "i..n" "is." "i.o" "a..i")
("polk" "po")
("taylor" "ay." "ta")
("pierce" "ie." "rc" "r.e$")
("buchanan" "bu" "a.a" ".h.n")
("lincoln" "i..o" "li")
("grant" "an.$" "a.t" "ra" "r.n" "g..n")
("hayes" "h..e" "ye" "ay.")
("garfield" "el.$" "i.l" "ga" "ie." "r.i" ".f" "a..i")
("cleveland" "v.l" "an.$")
("mckinley" "n.e" "nl" "i.l" "m..i")
("roosevelt" ".se" "oo" "v.l" "el.$" "r..s")
("taft" "a.t" "ta" ".f")
("wilson" "ls" "i..o")
("harding" "r.i" "di" "a..i")
("coolidge" "oo" "li")
("hoover" "ho" "oo")
("truman" "u..n" "ma")
("eisenhower" "ho" ".se" "h..e" "i..n" "is.")
("kennedy" "nn" "n.e")
("johnson" "j")
("nixon" "^n" "i..n" "i.o" "n..o")
("carter" "rt" "a.t")
("reagan" "ga" "a.a")
("bush" "bu" "sh")
("obama" ".b" "ma" "a.a" "am"))</pre>As you can see, we have reduced the original 2091 matching regexps to fifty.<br />
<br />
Changes to the set-cover code coming soon....Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-48821469937434849332014-08-01T09:49:00.000-07:002014-08-01T09:49:23.115-07:00Mini regex golfI was intrigued by Peter Norvig's articles about <a href="http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb">regex golf</a>.<br />
<br />
To make things easier to think about, I decided to start with the simpler problem of looking for substrings. Here's code to extract the ngrams of a string:<br />
<pre>(define (ngrams-of-length n string)
(do ((start 0 (1+ start))
(end n (1+ end))
(answer '() (lset-adjoin string=? answer (substring string start end))))
((> end (string-length string)) answer)))
(define (ngrams string)
(do ((n 1 (+ n 1))
(answer '() (append (ngrams-of-length n string) answer)))
((> n (string-length string)) answer)))</pre>A solution is simply a list of ngrams. (Although not every list of ngrams is a solution!) <br />
<pre>(define (solution? solution winners losers)
(let ((matches-solution? (ngram-list-matcher solution)))
(and (for-all? winners matches-solution?)
(not (there-exists? losers matches-solution?)))))
(define (ngram-list-matcher ngram-list)
(lambda (test-string)
(there-exists? ngram-list
(lambda (ngram)
(string-search-forward ngram test-string)))))
</pre>We also want to know if an ngram appears in a given list of strings.<br />
<pre>(define (string-list-matcher string-list)
(lambda (test-ngram)
(there-exists? string-list
(lambda (string)
(string-search-forward test-ngram string)))))
(fluid-let ((*unparser-list-breadth-limit* 10))
(let ((matches-loser? (string-list-matcher losers)))
(for-each
(lambda (winner) (write-string winner) (write-string ": ")
(write (reverse (delete-matching-items (ngrams winner) matches-loser?)))
(newline))
winners)))
washington: ("sh" "hi" "gt" "to" "was" "ash" "shi" "hin" "ngt" "gto" ...)
adams: ("ad" "am" "ms" "ada" "dam" "ams" "adam" "dams" "adams")
jefferson: ("j" "je" "ef" "ff" "fe" "rs" "jef" "eff" "ffe" "fer" ...)
madison: ("ma" "ad" "di" "mad" "adi" "dis" "iso" "madi" "adis" "diso" ...)
monroe: ("oe" "onr" "nro" "roe" "monr" "onro" "nroe" "monro" "onroe" "monroe")
jackson: ("j" "ja" "ac" "ks" "jac" "ack" "cks" "kso" "jack" "acks" ...)
van-buren: ("-" "va" "n-" "-b" "bu" "van" "an-" "n-b" "-bu" "bur" ...)
harrison: ("har" "arr" "rri" "ris" "iso" "harr" "arri" "rris" "riso" "ison" ...)
polk: ("po" "pol" "olk" "polk")
taylor: ("ta" "yl" "lo" "tay" "ayl" "ylo" "lor" "tayl" "aylo" "ylor" ...)
pierce: ("rc" "ce" "pie" "ier" "erc" "rce" "pier" "ierc" "erce" "pierc" ...)
buchanan: ("bu" "uc" "ch" "na" "buc" "uch" "cha" "ana" "nan" "buch" ...)
lincoln: ("li" "ln" "lin" "col" "oln" "linc" "inco" "ncol" "coln" "linco" ...)
grant: ("ra" "gra" "ran" "ant" "gran" "rant" "grant")
hayes: ("ye" "hay" "aye" "yes" "haye" "ayes" "hayes")
garfield: ("ga" "rf" "fi" "gar" "arf" "rfi" "fie" "iel" "eld" "garf" ...)
cleveland: ("lev" "vel" "ela" "clev" "leve" "evel" "vela" "elan" "cleve" "level" ...)
mckinley: ("nl" "mck" "inl" "nle" "mcki" "kinl" "inle" "nley" "mckin" "ckinl" ...)
roosevelt: ("oo" "os" "lt" "roo" "oos" "ose" "sev" "vel" "elt" "roos" ...)
taft: ("ta" "af" "ft" "taf" "aft" "taft")
wilson: ("ls" "ils" "lso" "wils" "ilso" "lson" "wilso" "ilson" "wilson")
harding: ("di" "har" "ard" "rdi" "din" "hard" "ardi" "rdin" "ding" "hardi" ...)
coolidge: ("oo" "li" "coo" "ool" "oli" "lid" "cool" "ooli" "olid" "lidg" ...)
hoover: ("ho" "oo" "hoo" "oov" "hoov" "oove" "hoove" "oover" "hoover")
truman: ("tr" "ru" "ma" "tru" "rum" "uma" "man" "trum" "ruma" "uman" ...)
eisenhower: ("ei" "nh" "ho" "ow" "eis" "ise" "sen" "enh" "nho" "how" ...)
kennedy: ("nn" "ed" "dy" "ken" "enn" "nne" "ned" "edy" "kenn" "enne" ...)
johnson: ("j" "jo" "oh" "hn" "joh" "ohn" "hns" "john" "ohns" "hnso" ...)
nixon: ("ni" "ix" "xo" "nix" "ixo" "xon" "nixo" "ixon" "nixon")
carter: ("rt" "car" "art" "rte" "cart" "arte" "rter" "carte" "arter" "carter")
reagan: ("ea" "ag" "ga" "rea" "eag" "aga" "gan" "reag" "eaga" "agan" ...)
bush: ("bu" "us" "sh" "bus" "ush" "bush")
clinton: ("li" "to" "cli" "lin" "int" "nto" "ton" "clin" "lint" "into" ...)
obama: ("ob" "ba" "am" "ma" "oba" "bam" "ama" "obam" "bama" "obama")
</pre>We can discard ngrams like "shi" because the shorter ngram "sh" will also match.<br />
<pre>(define (dominant-ngrams string losing-ngram?)
(do ((n 1 (+ n 1))
(answer '() (append
(delete-matching-items
(ngrams-of-length n string)
(lambda (item)
(or (there-exists? answer
(lambda (ngram)
(string-search-forward ngram item)))
(losing-ngram? item))))
answer)))
((> n (string-length string)) answer)))
(fluid-let ((*unparser-list-breadth-limit* 10))
(let ((matches-loser? (string-list-matcher losers)))
(for-each
(lambda (winner) (write-string winner) (write-string ": ")
(write (dominant-ngrams winner matches-loser?))
(newline))
winners)))
washington: ("was" "to" "gt" "hi" "sh")
adams: ("ms" "am" "ad")
jefferson: ("rs" "fe" "ff" "ef" "j")
madison: ("iso" "di" "ad" "ma")
monroe: ("nro" "onr" "oe")
jackson: ("ks" "ac" "j")
van-buren: ("ren" "ure" "bu" "va" "-")
harrison: ("iso" "ris" "rri" "arr" "har")
polk: ("olk" "po")
taylor: ("lo" "yl" "ta")
pierce: ("ier" "pie" "ce" "rc")
buchanan: ("na" "ch" "uc" "bu")
lincoln: ("inco" "col" "ln" "li")
grant: ("ant" "ra")
hayes: ("hay" "ye")
garfield: ("eld" "iel" "fi" "rf" "ga")
cleveland: ("ela" "vel" "lev")
mckinley: ("mck" "nl")
roosevelt: ("vel" "sev" "lt" "os" "oo")
taft: ("ft" "af" "ta")
wilson: ("ls")
harding: ("ard" "har" "di")
coolidge: ("li" "oo")
hoover: ("oo" "ho")
truman: ("ma" "ru" "tr")
eisenhower: ("wer" "sen" "ise" "ow" "ho" "nh" "ei")
kennedy: ("ken" "dy" "ed" "nn")
johnson: ("hn" "oh" "j")
nixon: ("xo" "ix" "ni")
carter: ("car" "rt")
reagan: ("ga" "ag" "ea")
bush: ("sh" "us" "bu")
clinton: ("int" "to" "li")
obama: ("ma" "am" "ba" "ob")
</pre>It's time to tackle the set cover problem. We want a set of ngrams that match all the strings. Obviously, if we pick an ngram from each of the strings we want to cover, we'll have a solution. For instance,<br />
<pre>(let ((matches-loser? (string-list-matcher losers)))
(solution? (delete-duplicates
(map
(lambda (winner) (car (dominant-ngrams winner matches-loser?)))
winners))
winners losers))
;Value: #t
</pre>We can cycle through all the possible solutions and then select the best one.<br />
<pre>(define (mini-golf0 winners losers)
(lowest-scoring
(cover0 (make-dominant-ngram-table
winners
(delete-losing-superstrings winners losers)))))
(define (delete-losing-superstrings winners losers)
(delete-matching-items
losers
(lambda (loser)
(there-exists? winners
(lambda (winner)
(string-search-forward winner loser))))))
(define (make-dominant-ngram-table winners losers)
(let ((losing-ngram? (string-list-matcher losers)))
(map (lambda (winner)
(cons winner (dominant-ngrams winner losing-ngram?)))
winners)))
(define (cover0 v-k-table)
(let ((empty-solution-set (list '())))
(fold-left add-v-k-entry0 empty-solution-set v-k-table)))
(define (add-v-k-entry0 solution-set v-k-entry)
(let ((value (car v-k-entry))
(keys (cdr v-k-entry)))
(write-string "Adding value ") (write value) (newline)
(write-string " with keys ") (write keys) (newline)
(write-string " to ") (write (length solution-set))
(write-string " partial solutions.") (newline)
(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))
(write-string "Returning ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)
new-solutions)))
(define (lowest-scoring list)
(least-elements list (lambda (l r) (< (score l) (score r)))))
(define (cartesian-product left-list right-list)
(fold-left (lambda (answer left)
(fold-left (lambda (answer right)
(cons (cons left right) answer))
answer
right-list))
'()
left-list))
(define (make-new-solution cp-term)
(let ((solution (car cp-term))
(key (cdr cp-term)))
(lset-adjoin equal? solution key)))
(define (improper-list-error procedure thing)
(error (string-append "Improper list found by " procedure ": ") thing))
(define (least-elements list <)
(define (accumulate-least answer item)
(cond ((< (car answer) item) answer)
((< item (car answer)) (cons item '()))
(else (cons item answer))))
(cond ((pair? list) (fold-left accumulate-least
(cons (car list) '())
(cdr list)))
((null? list) (error "List must have at least one element." list))
(else (improper-list-error 'LEAST-ELEMENTS list))))
(define (score solution)
(do ((tail solution (cdr tail))
(score -1 (+ score (string-length (car tail)) 1)))
((not (pair? tail))
(if (null? tail)
score
(improper-list-error 'score solution)))))
</pre>
This works for small sets:
<pre>1 ]=> (mini-golf0 boys girls)
Adding value "jacob"
with keys ("ob" "c" "j")
to 1 partial solutions.
Returning 3 new partial solutions.
Adding value "mason"
with keys ("as")
to 3 partial solutions.
Returning 3 new partial solutions.
Adding value "ethan"
with keys ("an" "ha")
to 3 partial solutions.
Returning 6 new partial solutions.
Adding value "noah"
with keys ("ah" "oa" "no")
to 6 partial solutions.
Returning 18 new partial solutions.
Adding value "william"
with keys ("lia" "lli" "ill" "am" "w")
to 18 partial solutions.
Returning 90 new partial solutions.
Adding value "liam"
with keys ("lia" "am")
to 90 partial solutions.
Returning 180 new partial solutions.
Adding value "jayden"
with keys ("en" "de" "yd" "ay" "j")
to 180 partial solutions.
Returning 900 new partial solutions.
Adding value "michael"
with keys ("ae" "ha" "c")
to 900 partial solutions.
Returning 2700 new partial solutions.
Adding value "alexander"
with keys ("de" "nd" "an" "le" "al" "r" "x")
to 2700 partial solutions.
Returning 18900 new partial solutions.
Adding value "aiden"
with keys ("en" "de" "id")
to 18900 partial solutions.
Returning 56700 new partial solutions.
;Value 41: (("de" "am" "ah" "ha" "as" "j")
("de" "am" "ah" "ha" "as" "j")
("de" "am" "oa" "ha" "as" "j")
("de" "am" "oa" "ha" "as" "j")
("de" "am" "no" "ha" "as" "j")
("de" "am" "no" "ha" "as" "j")
("de" "am" "ah" "ha" "as" "c")
("de" "am" "ah" "ha" "as" "c")
("de" "am" "oa" "ha" "as" "c")
("de" "am" "oa" "ha" "as" "c")
("de" "am" "no" "ha" "as" "c")
("de" "am" "no" "ha" "as" "c")
("de" "am" "ah" "an" "as" "c")
("de" "am" "ah" "an" "as" "c")
("en" "am" "ah" "an" "as" "c")
("de" "am" "oa" "an" "as" "c")
("de" "am" "oa" "an" "as" "c")
("en" "am" "oa" "an" "as" "c")
("de" "am" "no" "an" "as" "c")
("de" "am" "no" "an" "as" "c")
("en" "am" "no" "an" "as" "c"))
</pre>But you can see that we won't be able to go much beyond this because there are just too many combinations.
We can cut down on the intermediate partial solutions by noticing that many of them are redundant. We don't need to keep partial solutions that cannot possibly lead to a shortest final solution. The various partial solutions each (potentially) match different sets of words. We only need keep the shortest solution for each different set of matched words. Furthermore, if a solution's matches are a superset of another's matches, and the other is the same length or longer, then the solution is dominated by the other and will always be at least the length of the longer.
<pre>(define (mini-golf1 winners losers)
(cover1
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers))
lowest-scoring))
(define (cover1 v-k-table lowest-scoring)
(let ((empty-solution-set (list '())))
(define (add-v-k-entry solution-set v-k-entry)
(let ((value (car v-k-entry))
(keys (cdr v-k-entry)))
(write-string "Adding value ") (write value) (newline)
(write-string " with keys ") (write keys) (newline)
(write-string " to ") (write (length solution-set))
(write-string " partial solutions.") (newline)
(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))
(let ((trimmed-solutions (trim-partial-solutions new-solutions)))
(write-string "Returning ") (write (length trimmed-solutions))
(write-string " of ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)
trimmed-solutions))))
(define (trim-partial-solutions partial-solutions)
(let ((equivalent-solutions (collect-equivalent-partial-solutions partial-solutions)))
(write-string " Deleting ")
(write (- (length partial-solutions) (length equivalent-solutions)))
(write-string " equivalent partial solutions.")
(newline)
(remove-dominated-solutions
(map lowest-scoring-equivalent-partial-solution equivalent-solutions))))
(define (lowest-scoring-equivalent-partial-solution entry)
(first (lowest-scoring (car entry))))
(define (collect-equivalent-partial-solutions alist)
;; Add each entry in turn.
(fold-left (lambda (equivalents partial-solution)
(add-equivalent-partial-solution
partial-solution
(partial-solution-matches partial-solution)
equivalents))
'() alist))
(define (partial-solution-matches partial-solution)
(keep-matching-items v-k-table
(lambda (entry)
(there-exists? partial-solution
(lambda (key) (member key (cdr entry)))))))
(define (remove-dominated-solutions partial-solutions)
(let ((before-length (length partial-solutions)))
(let ((answer (map car (fold-left (lambda (answer solution)
(if (there-exists? answer (dominates-solution? solution))
answer
(cons solution answer)))
'()
(map (lambda (partial-solution)
(cons partial-solution (partial-solution-matches partial-solution)))
partial-solutions)))))
(let ((after-length (length answer)))
(write-string " Deleting ") (write (- before-length after-length))
(write-string " dominated solutions.")
(newline)
answer))))
(lowest-scoring
(fold-left add-v-k-entry empty-solution-set v-k-table))))
(define (dominates-solution? solution)
(let ((partial-solution (car solution))
(solution-matches (cdr solution)))
(lambda (other-solution)
(let ((other-partial-solution (car other-solution))
(other-matches (cdr other-solution)))
(and (not (equal? solution-matches other-matches))
(superset? other-matches solution-matches)
(<= (score other-partial-solution) (score partial-solution)))))))
(define (add-equivalent-partial-solution solution value alist)
(cond ((pair? alist)
(let ((entry (car alist))
(tail (cdr alist)))
(let ((entry-solutions (car entry))
(entry-value (cdr entry)))
(if (equal? value entry-value)
(if (member solution entry-solutions)
alist
(cons (cons (cons solution entry-solutions) value)
tail))
(cons entry (add-equivalent-partial-solution solution value tail))))))
((null? alist) (list (cons (list solution) value)))
(else (improper-list-error 'collect-equivalents alist))))
</pre>
<pre>1 ]=> (mini-golf1 winners losers)
Adding value "washington"
with keys ("was" "to" "gt" "hi" "sh")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 1 dominated solutions.
Returning 2 of 5 new partial solutions.
Adding value "adams"
with keys ("ms" "am" "ad")
to 2 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 4 of 6 new partial solutions.
Adding value "jefferson"
with keys ("rs" "fe" "ff" "ef" "j")
to 4 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 4 dominated solutions.
Returning 4 of 20 new partial solutions.
Adding value "madison"
with keys ("iso" "di" "ad" "ma")
to 4 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 12 of 16 new partial solutions.
Adding value "monroe"
with keys ("nro" "onr" "oe")
to 12 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 36 new partial solutions.
Adding value "jackson"
with keys ("ks" "ac" "j")
to 12 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 36 new partial solutions.
Adding value "van-buren"
with keys ("ren" "ure" "bu" "va" "-")
to 12 partial solutions.
Deleting 36 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 24 of 60 new partial solutions.
Adding value "harrison"
with keys ("iso" "ris" "rri" "arr" "har")
to 24 partial solutions.
Deleting 96 equivalent partial solutions.
Removing 12 dominated solutions.
Returning 12 of 120 new partial solutions.
Adding value "polk"
with keys ("olk" "po")
to 12 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 24 new partial solutions.
Adding value "taylor"
with keys ("lo" "yl" "ta")
to 12 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 12 dominated solutions.
Returning 12 of 36 new partial solutions.
Adding value "pierce"
with keys ("ier" "pie" "ce" "rc")
to 12 partial solutions.
Deleting 36 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 48 new partial solutions.
Adding value "buchanan"
with keys ("na" "ch" "uc" "bu")
to 12 partial solutions.
Deleting 39 equivalent partial solutions.
Removing 3 dominated solutions.
Returning 6 of 48 new partial solutions.
Adding value "lincoln"
with keys ("inco" "col" "ln" "li")
to 6 partial solutions.
Deleting 15 equivalent partial solutions.
Removing 6 dominated solutions.
Returning 3 of 24 new partial solutions.
Adding value "grant"
with keys ("ant" "ra")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 6 new partial solutions.
Adding value "hayes"
with keys ("hay" "ye")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 6 new partial solutions.
Adding value "garfield"
with keys ("eld" "iel" "fi" "rf" "ga")
to 3 partial solutions.
Deleting 9 equivalent partial solutions.
Removing 3 dominated solutions.
Returning 3 of 15 new partial solutions.
Adding value "cleveland"
with keys ("ela" "vel" "lev")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 9 new partial solutions.
Adding value "mckinley"
with keys ("mck" "nl")
to 6 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 12 new partial solutions.
Adding value "roosevelt"
with keys ("vel" "sev" "lt" "os" "oo")
to 6 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 30 new partial solutions.
Adding value "taft"
with keys ("ft" "af" "ta")
to 6 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 18 new partial solutions.
Adding value "wilson"
with keys ("ls")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
Adding value "harding"
with keys ("ard" "har" "di")
to 6 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 4 of 18 new partial solutions.
Adding value "coolidge"
with keys ("li" "oo")
to 4 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 4 of 8 new partial solutions.
Adding value "hoover"
with keys ("oo" "ho")
to 4 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 2 of 8 new partial solutions.
Adding value "truman"
with keys ("ma" "ru" "tr")
to 2 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 1 dominated solutions.
Returning 1 of 6 new partial solutions.
Adding value "eisenhower"
with keys ("wer" "sen" "ise" "ow" "ho" "nh" "ei")
to 1 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 7 new partial solutions.
Adding value "kennedy"
with keys ("ken" "dy" "ed" "nn")
to 1 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
Adding value "johnson"
with keys ("hn" "oh" "j")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
Adding value "nixon"
with keys ("xo" "ix" "ni")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
Adding value "carter"
with keys ("car" "rt")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
Adding value "reagan"
with keys ("ga" "ag" "ea")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
Adding value "bush"
with keys ("sh" "us" "bu")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
Adding value "clinton"
with keys ("int" "to" "li")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
Adding value "obama"
with keys ("ma" "am" "ba" "ob")
to 1 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
;Value 47: (("rt" "ni" "nn" "ho" "ls" "nl" "vel" "ga" "ye" "ra" "li" "rc" "ta" "po" "har" "bu" "oe" "ma" "j" "ad" "sh"))</pre>The <code>cover</code> procedure takes a table that maps values to the keys that cover them. If we can reduce the size of that table without changing the solution, we'll run faster. If there are two entries in the table such that the keys of one are a superset of the keys of the other, we can discard the superset: the smaller of the two entries will be in the solution, and any key that matches the smaller one will automatically match the larger one as well.
Also, if two values have the same set of keys that match them, we need only include one of the values in the table.
<pre>(define (delete-dominated-values v-k-table)
(let ((size-before (length v-k-table)))
(define (dominated-value? entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? v-k-table
(lambda (other-entry)
(and (not (eq? entry other-entry))
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(and (superset? entry-keylist other-keylist)
(not (equal? other-keylist entry-keylist)))))))))
(define (equivalent-value-in-answer? answer entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? answer
(lambda (other-entry)
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(equal? entry-keylist other-keylist))))))
(define (add-entry answer entry)
(if (or (equivalent-value-in-answer? answer entry)
(dominated-value? entry))
answer
(cons entry answer)))
(let ((answer (fold-left add-entry '() v-k-table)))
(write-string "Removed ") (write (- size-before (length answer)))
(write-string " dominated and equivalent values.")
(newline)
answer)))
(define (superset? bigger smaller)
(for-all? smaller (lambda (s) (member s bigger))))
(define (mini-golf2 winners losers)
(cover1
(delete-dominated-values
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers)))
lowest-scoring))
;;;;;;;;
;; Delete dominated keys from the keylists.
(define (mini-golf3 winners losers)
(cover1
(delete-dominated-keys-and-values
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers))
(lambda (left right)
(or (< (string-length left) (string-length right))
(and (= (string-length left) (string-length right))
(string<? left right)))))
lowest-scoring))
(define (delete-dominated-keys-and-values v-k-table better-key)
(let ((before-size (fold-left * 1 (map length v-k-table))))
(let ((new-table (delete-dominated-values
(delete-dominated-keys v-k-table better-key))))
(let ((after-size (fold-left * 1 (map length new-table))))
(if (= before-size after-size)
v-k-table
(delete-dominated-keys-and-values new-table better-key))))))
(define (delete-dominated-keys v-k-table better-key)
(let ((all-keys (get-all-keys v-k-table)))
(define (lookup-key key)
(cons key
(map car
(keep-matching-items v-k-table
(lambda (v-k-entry)
(member key (cdr v-k-entry)))))))
(let ((k-v-table (map lookup-key all-keys)))
(define (dominated-key? key)
(let ((values (cdr (assoc key k-v-table))))
(there-exists? k-v-table
(lambda (entry)
(let ((entry-key (car entry))
(entry-values (cdr entry)))
(and (superset? entry-values values)
(not (equal? values entry-values))
(or (< (string-length entry-key) (string-length key))
(and (= (string-length entry-key) (string-length key))
(string<? entry-key key)))))))))
(define (equivalent-key-in-answer? answer key)
(let ((values (cdr (assoc key k-v-table))))
(there-exists? answer
(lambda (entry-key)
(let ((entry-values (cdr (lookup-key entry-key))))
(equal? values entry-values))))))
(define (add-keys answer key)
(if (or (dominated-key? key)
(equivalent-key-in-answer? answer key))
answer
(cons key answer)))
(let ((good-keys (fold-left add-keys '() (sort all-keys better-key))))
(write-string "Removed ") (write (- (length all-keys) (length good-keys)))
(write-string " of ") (write (length all-keys)) (write-string " keys.")(newline)
(map (lambda (entry)
(cons (car entry)
(keep-matching-items (cdr entry) (lambda (key) (member key good-keys)))))
v-k-table)))))
(define (get-all-keys v-k-table)
(fold-left (lambda (answer entry)
(fold-left (lambda (answer key)
(lset-adjoin equal? answer key))
answer
(cdr entry)))
'()
v-k-table))</pre>Trimming the table this way helps a lot. We can now compute the dogs vs. cats.
<pre>1 ]=> (mini-golf3 dogs cats)
Removed 294 of 405 keys.
Removed 44 dominated and equivalent values.
Removed 25 of 93 keys.
Removed 15 dominated and equivalent values.
Removed 7 of 62 keys.
Removed 0 dominated and equivalent values.
Removed 0 of 55 keys.
Removed 0 dominated and equivalent values.
Adding value "BORZOIS"
with keys ("OIS" "BOR" "RZ")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 3 new partial solutions.
Adding value "GIANT SCHNAUZERS"
with keys ("SCH" "HN")
to 3 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
Adding value "BASENJIS"
with keys ("JI")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
Adding value "ENGLISH SETTERS"
with keys ("TERS" "ETT")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 12 new partial solutions.
Adding value "JAPANESE CHIN"
with keys ("CHI")
to 12 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 12 new partial solutions.
Adding value "BOUVIERS DES FLANDRES"
with keys ("S F" "DES" " DE" "IER" "FL" "VI")
to 12 partial solutions.
Deleting 8 equivalent partial solutions.
Removing 8 dominated solutions.
Returning 56 of 72 new partial solutions.
Adding value "PEKINGESE"
with keys ("EKI")
to 56 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 56 of 56 new partial solutions.
Adding value "BELGIAN MALINOIS"
with keys (" MAL" "OIS" "LG")
to 56 partial solutions.
Deleting 96 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 72 of 168 new partial solutions.
Adding value "GERMAN WIREHAIRED POINTERS"
with keys ("TERS" "D P")
to 72 partial solutions.
Deleting 108 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 144 new partial solutions.
Adding value "CHOW CHOWS"
with keys ("W ")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
Adding value "SAMOYEDS"
with keys ("DS")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
Adding value "DOGUES DE BORDEAUX"
with keys ("BOR" " DE" "GU")
to 36 partial solutions.
Deleting 88 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 18 of 108 new partial solutions.
Adding value "DALMATIANS"
with keys ("ANS" "LM")
to 18 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
Adding value "LHASA APSOS"
with keys ("LH")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
Adding value "CANE CORSO"
with keys (" COR" "ORS")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 72 of 72 new partial solutions.
Adding value "ALASKAN MALAMUTES"
with keys (" MAL" "TES" "LAS" "KA")
to 72 partial solutions.
Deleting 184 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 104 of 288 new partial solutions.
Adding value "WHIPPETS"
with keys ("IP")
to 104 partial solutions.
Deleting 0 equivalent partial solutions.
;GC #199: took: 0.20 (1%) CPU time, 0.10 (1%) real time; free: 16754359
Removing 0 dominated solutions.
Returning 104 of 104 new partial solutions.
Adding value "SHIBA INU"
with keys ("SHI" " I")
to 104 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 208 of 208 new partial solutions.
Adding value "AKITAS"
with keys ("AK")
to 208 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 208 of 208 new partial solutions.
Adding value "RHODESIAN RIDGEBACKS"
with keys ("DES" "DG" "OD")
to 208 partial solutions.
Deleting 304 equivalent partial solutions.
Removing 144 dominated solutions.
Returning 176 of 624 new partial solutions.
Adding value "BICHONS FRISES"
with keys ("S F" "FR")
to 176 partial solutions.
Deleting 224 equivalent partial solutions.
Removing 16 dominated solutions.
Returning 112 of 352 new partial solutions.
Adding value "PAPILLONS"
with keys ("API")
to 112 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 112 of 112 new partial solutions.
Adding value "COLLIES"
with keys ("IES")
to 112 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 112 of 112 new partial solutions.
Adding value "VIZSLAS"
with keys ("LAS" "IZ" "VI")
to 112 partial solutions.
;GC #200: took: 0.10 (0%) CPU time, 0.10 (1%) real time; free: 16757322
Deleting 272 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 336 new partial solutions.
Adding value "BRITTANYS"
with keys ("ITT")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
Adding value "PUGS"
with keys ("GS")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
Adding value "HAVANESE"
with keys ("HAVANE")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
Adding value "COCKER SPANIELS"
with keys ("ANI" "LS")
to 64 partial solutions.
Deleting 80 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 48 of 128 new partial solutions.
Adding value "MASTIFFS"
with keys ("FS")
to 48 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 48 of 48 new partial solutions.
Adding value "MALTESE"
with keys ("TES" "LT")
to 48 partial solutions.
Deleting 72 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 24 of 96 new partial solutions.
Adding value "PEMBROKE WELSH CORGIS"
with keys (" COR" "LS")
to 24 partial solutions.
Deleting 32 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 16 of 48 new partial solutions.
Adding value "BOSTON TERRIERS"
with keys ("IER" " T")
to 16 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 4 dominated solutions.
Returning 4 of 32 new partial solutions.
Adding value "POMERANIANS"
with keys ("ANS" "ANI")
to 4 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 2 of 8 new partial solutions.
Adding value "GREAT DANES"
with keys ("GR")
to 2 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 2 of 2 new partial solutions.
Adding value "DOBERMAN PINSCHERS"
with keys ("SCH" " PI")
to 2 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
Adding value "SHIH TZU"
with keys ("SHI" " T")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
Adding value "ROTTWEILERS"
with keys ("EI")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
Adding value "POODLES"
with keys ("DL" "OD")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
Adding value "BOXERS"
with keys ("OX")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
Adding value "BEAGLES"
with keys ("AGL")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
Adding value "LABRADOR RETRIEVERS"
with keys ("VE")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
;Value 50: (("VE" "AGL" "OX" "EI" "GR" " T" "FS" "LS" "HAVANE" "GS" "ITT" "IES" "API" "FR" "OD" "AK" " I" "IP" "TES" "ORS" "LH" "ANS" "GU" "DS" "W " "EKI" "VI" "CHI" "TERS" "JI" "SCH" "OIS"))</pre>We appear to have the substring version of regex golf under control. Can we extend it to actual regular expressions? Of course we can. In the next installment...<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-62452715224708931192013-10-18T17:58:00.000-07:002013-10-18T17:58:53.542-07:00Unexciting codeSource code control isn't supposed to be interesting or exciting for the user. Blogging about source code control isn't going to be interesting or surprising for the reader, either. I'll spare you the walkthrough. You're old enough to figure it out without me holding your hand. It'll be more interesting if I blog about the problems and bugs we encountered.<br />
<br />
Conman was the configuration manager built on the core ChangeSafe code. The <em>master catalog</em> in conman holds the highest level objects being managed.<pre>(defclass master-catalog ()
;; subsystems and products (PC's) in this repository
((products :initform nil
:version-technique :composite-set
:accessor master-catalog/products)
(subsystems :initform nil
:version-technique :composite-set
:accessor master-catalog/subsystems)
;; All satellite repositories known to this master. We don't allow
;; satellite repository names to change, though the projects they
;; contain may be renamed. **WARNING** BE CAREFUL IF YOU OPERATE
;; in non-latest metaversion views of this list or you might try to
;; create a satellite which already exists. Only update this list
;; using the latest metaversion.
(satellite-repositories :initform nil :version-technique :composite-set)
;; Projects (a.k.a. ChangeSafe "classes") contained in satellite
;; repositories. The descriptors contain the key mapping from a
;; project name to a satellite-repository-name. We could almost
;; make this just a (project-name . satellite-name) mapping, but we
;; need to version project-name, and we also want to cache in the
;; master some information in the satellites so that we don't
;; always have to examine the satellites for often accessed
;; information.
(classes :accessor master-catalog/classes
:initform nil
:version-technique :composite-set)
;; Cset-relationship-tuples is conceptually a list of sublists,
;; where each sublist is a tuple. For every master cid which
;; results in the creation of satellite cids, a tuple is added
;; which enumerates the master cid and the satellite cids which it
;; caused to be created. e.g. '((master.cid.1 satellite-1.cid.1
;; satellite-2.cid.1)) Because we want portable references, blah
;; blah blah, we actually reference DIDS of CHANGE-SET objects
;; rather than the cids. We may instead wish to store CID-OBJECT
;; references. TBD.
;; Right now, this information is maintained only for change
;; transactions which arise from WITH-CM-MASTER-TXN and
;; WITH-CM-SATELLITE-TXN. This is ok, since those are the
;; interesting txns which manipulate satellite databases.
;; Note that because of the high volume of csets we expect to
;; track, we actually represent this information as a vector of
;; vectors to achieve space compaction.
(cset-relationship-tuples :initform (make-instance 'persistent-vector
:initial-element nil
:size 1)
:version-technique :nonversioned)
(cset-rel-tuples-index :initform (make-instance 'persistent-vector
:initial-element -1
:size 1)
:version-technique :nonversioned)
;; BOTH these slots are updated ONLY by vm-txn-note-change-set,
;; except for schema upgrading.
;; The cset-rel-tuples-index slot is a conceptual hash table into the
;; cset-relationship-tuples slot. This is used by
;; master-catalog-lookup-cset-relationship
;; to avoid an extremely costly linear search of cset-relationship-tuples.
;; This is very important for cset_add, cset_remove, and csets_from_file.
;; The idea is that the did-string of the master-cid's did is hashed.
;; Reducing that hash modulo the number of entries in cset-rel-tuples-index,
;; finds a "home" index of cset-rel-tuples-index. Using the sb32 value
;; in that element, we either have a -1 (so the entry is not in the
;; hash table) or we get an index into cset_relationship_tuples.
;; If there is no hash collision, that element of cset_relationship_tuples
;; will contain the desired master-cid did we are looking for. If it
;; isn't the one we want, we have had a hash collision, and we resolve it
;; by linear probing in the next-door (circularly) element of
;; cset-rel-tuples-index.
;; The number of elements of cset-rel-tuples-index is always a prime number,
;; and is also maintained to be more than twice as large as the number of
;; entries in cset-relationship-tuples. That is important, to prevent
;; clustering and slow searching. So when it grows, cset-rel-tuples-index
;; grows by a reasonable factor (about 2) so that it always contains
;; at least half "holes", that is, -1. Further, we want to avoid frequent
;; growth, because growing requires computing every entry in the hash table
;; again. That makes for a big transaction, as every element of the
;; cid-relationship-tuple vector has to be mapped in, and rehashed with
;; the new size of cset-rel-tuples-index.
;; Space considerations: In Jack's db, there are roughly 40,000 elements
;; currently in the cset-relationship-tuples. Suppose we had 100,000
;; elements. In Jack's db, it appears that the tuples are about 2 elements
;; each, average. Suppose it were 9. Then the tuples would take 4*(1+9)=40
;; bytes each, so 40*100,000 = 4Mb total (plus another 400,000 for the
;; cset-relationship-tuples vector itself). This is large, but not likely
;; to be a cause of breakage anytime soon.
;; The cset-name-hashtable maps cset names to csets. While the HP
;; model of ChangeSafe doesn't allow changing the name of a cset,
;; we allow this in general. So this hash table is keyed by cset
;; name, and valued by all csets which EVER bore that name in their
;; versioned name component. The hash value is therefore a list.
;; In the case of system augmented names (by
;; change_create/master_change), there shouldn't be any collisions.
;; We also use this slot to hash unaugmented user names to csets,
;; and those are far more likely to have collisions (one key ->
;; multiple csets). In the case of un-augmented names, this is
;; expected. In the case of augmented names, this is an error.
(cset-name-hashtable :version-technique nil
:initform (make-instance 'persistent-hash-table :size 1023)
:reader master-catalog/cset-name-hashtable)
)
(:documentation "Catalog/hierarchy-root of versioned information maintained in the master repository.")
(:metaclass versioned-standard-class)
(:schema-version 0))</pre>Pretty straightforward, no? No? Let's list the <em>products</em> in the master catalog:<br />
<pre>(defun cmctl/list-products (conman-request)
"cheesy hack to list the products"
(let ((master-repository-name (conman-request/repository-dbpath conman-request))
(reason "list products")
(userid (conman-request/user-name conman-request)))
(call-with-master-catalog-transaction
master-repository-name
userid
:master-metaversion :latest-metaversion
:version :latest-version
:reason reason
:transaction-type :read-only
:receiver (lambda (master-repository master-transaction master-catalog)
(declare (ignore master-repository master-transaction))
(collect 'list
(map-fn 't (lambda (product)
(list (distributed-object-identifier product)
(named-object/name product)
(described-object/description product)))
(master-catalog/scan-products master-catalog)))))))</pre>That isn't very edifying.<br />
<br />
Start from the end:<pre>(master-catalog/scan-products master-catalog)</pre>defined as<pre>(defun master-catalog/scan-products (master-catalog)
(declare (optimizable-series-function))
(versioned-object/scan-composite-versioned-slot master-catalog 'products))</pre>The <code>optimizable-series-function</code> declaration indicates that we are using Richard Waters's <a href="series.sourceforge.net/">series</a> package. This allows us to write functions that can be assembled into an efficient pipeline for iterating over a collection of objects. This code:<pre>(collect 'list
(map-fn 't (lambda (product)
(list (distributed-object-identifier product)
(named-object/name product)
(described-object/description product)))
(master-catalog/scan-products master-catalog)))</pre>takes each product in turn, creates a three element list of the identifier, the project name, and the product description, and finally collects the three-tuples in a list to be returned to the caller. Here is what it looks like macroexpanded:<pre>(COMMON-LISP:LET* ((#:OUT-917 MASTER-CATALOG))
(COMMON-LISP:LET (#:OUT-914)
(SETQ #:OUT-914 'PRODUCTS)
(COMMON-LISP:LET (#:OUT-913 #:OUT-912)
(SETQ #:OUT-913 (SLOT-VALUE-UNVERSIONED #:OUT-917 #:OUT-914))
(SETQ #:OUT-912
(IF *VERSIONED-VALUE-CID-SET-OVERRIDE*
(PROGN
(DEBUG-MESSAGE 4 "Using override cid-set to scan slot ~s"
#:OUT-914)
*VERSIONED-VALUE-CID-SET-OVERRIDE*)
(TRANSACTION/CID-SET *TRANSACTION*)))
(COMMON-LISP:LET (#:OUT-911 #:OUT-910 #:OUT-909)
(MULTIPLE-VALUE-SETQ (#:OUT-911 #:OUT-910 #:OUT-909)
(CVI-GET-ION-VECTOR-AND-INDEX #:OUT-913 #:OUT-912))
(IF #:OUT-911
NIL
(PROGN
(IF (COMMON-LISP:LET ((#:G717-902 #:OUT-910))
(IF #:G717-902
#:G717-902
(THE T (CVI/DEFAULT-ALLOWED #:OUT-913))))
NIL
(PROGN
(SLOT-UNBOUND (CLASS-OF #:OUT-917) #:OUT-917 #:OUT-914)))))
(DEBUG-MESSAGE 5 "Active ion vector for retrieval: ~s" #:OUT-911)
(COMMON-LISP:LET (#:OUT-908)
(SETQ #:OUT-908
(IF #:OUT-911
#:OUT-911
(THE T #())))
(COMMON-LISP:LET (#:ELEMENTS-907
(#:LIMIT-905 (ARRAY-TOTAL-SIZE #:OUT-908))
(#:INDEX-904 -1)
(#:INDEX-903 (- -1 1))
#:ITEMS-915
#:ITEMS-900
(#:LASTCONS-897 (LIST NIL))
#:LST-898)
(DECLARE (TYPE SERIES::VECTOR-INDEX+ #:LIMIT-905)
(TYPE SERIES::-VECTOR-INDEX+ #:INDEX-904)
(TYPE INTEGER #:INDEX-903)
(TYPE CONS #:LASTCONS-897)
(TYPE LIST #:LST-898))
(SETQ #:LST-898 #:LASTCONS-897)
(TAGBODY
#:LL-918
(INCF #:INDEX-904)
(LOCALLY
(DECLARE (TYPE SERIES::VECTOR-INDEX+ #:INDEX-904))
(IF (= #:INDEX-904 #:LIMIT-905)
(GO SERIES::END))
(SETQ #:ELEMENTS-907
(ROW-MAJOR-AREF #:OUT-908
(THE SERIES::VECTOR-INDEX
#:INDEX-904))))
(INCF #:INDEX-903)
(IF (MINUSP #:INDEX-903)
(GO #:LL-918))
(SETQ #:ITEMS-915
((LAMBDA (ION-SOUGHT)
(CVI-INSERTION-RECORD/GET-VALUE-FOR-ION
(SVREF #:OUT-909 ION-SOUGHT) ION-SOUGHT))
#:ELEMENTS-907))
(SETQ #:ITEMS-900
((LAMBDA (PRODUCT)
(LIST (DISTRIBUTED-OBJECT-IDENTIFIER PRODUCT)
(NAMED-OBJECT/NAME PRODUCT)
(DESCRIBED-OBJECT/DESCRIPTION PRODUCT)))
#:ITEMS-915))
(SETQ #:LASTCONS-897
(SETF (CDR #:LASTCONS-897) (CONS #:ITEMS-900 NIL)))
(GO #:LL-918)
SERIES::END)
(CDR #:LST-898)))))))</pre>To be continued...Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-45892419448730589392013-10-13T17:26:00.001-07:002013-10-13T17:26:37.172-07:00Next stepWe build a simple project/branch/version hierarchical abstraction, and we implement it on top of the core layer. <br />
<br />
<ul><li>a <em>project</em> is a collection of branches that represent alternative ways the state evolves. Every project has a main branch.</li>
<li>A <em>branch</em> is a collection of versions that represent the evolution of the branch state over time.</li>
<li>A <em>version</em> is a collection of change-sets with some trappings.<br />
</li>
<li>A <em>change-set</em> is our unit of change.</li>
</ul>When we do a read/write transaction, we'll add a new change set to the repository. Read-only transactions will specify a version for reading the core objects. Or two versions for generating diffs. Or maybe even three versions for a three-way merge.<br />
<br />
Under this development model, developers will sync their workspace with the most chronologically recent version of the main branch. Their new change sets will be visible only to them, unless (until, we hope) they are "promoted" into the development branch. <br />
<br />
We wanted to encourage frequent incremental check-ins. We wanted hook this up to Emacs autosave. Frequent check-ins of much smaller diffs breaks even.<br />
<br />
Once you get to the point where your code passes all the tests, you promote it into the branch so everyone can use it. Every now and then the admins will "fork a relase", do some "cherrypicks" and make a release branch in the project.<br />
<br />
Once you get to the point where your code passes all the tests, you promote it into the branch so everyone can use it. Every now and then the admins will "fork a relase", do some "cherrypicks" and make a release branch in the project.<br />
<br />
The core code does all the heavy lifting, this is window dressing. The style: minimalist. Skin the change model.<br />
<br />
This is turning into the code equivalent of a power-point lecture. You can read it yourself, I'm not going to walk you through it.<br />
<br />
So that's it? Is that all there is? Many source code or change management systems do something more or less similar to this with files and directory trees instead of CLOS objects. Cute hack, but why all the fuss? Hasn't this been done?<br />
<br />
If it seems obvious to you how we'd implement some of the usual source code control operations, good. We don't have to train you.<br />
<br />
I wont insult your intelligence explaining in detail how to do a rollback by removing a change-set from a version. Use SETF, of course.<br />
<br />
I hope nobody notices the 300 lb chicken sitting next to that shady-looking egg in the corner.<br />
<br />
And something for philosopher/implementors to worry about: If I demote the change-set that instantiates an object, where does the object go? Is it possible create a reference to an uninstantiated object? What happens if you try to follow it?<br />
<br />
<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-71859829076408543942013-09-23T11:14:00.000-07:002013-09-23T11:14:06.247-07:00Putting it all togetherA ChangeSafe repository is implemented as a transient wrapper object around a persistent object. The wrapper object caches some immutable metadata. You'd hate to have to run a transaction in the middle of the print function in order to print the repository name. The wrapper also contains metadata associated with the backing store that the repository is using.<br />
<br />
Oh yeah, there <i>is</i> something interesting going on in the wrapper, We keep track of the ongoing transactions by mapping the <code>user-id</code> to a list of transaction contexts (every nested transaction by a user "pushes" a new <code>txn-context</code>). <br />
<br />
Anyway, it's the <code>repository-persistent-information</code> that has the interesting stuff:<pre>(defclass repository-persistent-information ()
(
(type :initarg :type
:initform (error "Required initarg :type omitted.")
:reader repository-persistent-information/type
:type repository-type)
;; Database parent is the root extent for an extent database, or the master database for a satellite.
;; Root extents or master repositories won't have a parent
(parent-repository :initarg :parent-repository
:initform nil
:reader repository-persistent-information/parent
:type (optional relative-pathname))
;; Satellite repositories is non-nil only for master repositories.
(satellite-repositories :initform nil
:initarg :satellite-repositories
:accessor repository-persistent-information/satellite-repositories)
(canonical-class-dictionary :initform (make-instance 'canonical-class-dictionary)
:reader repository-persistent-information/canonical-class-dictionary)
(cid-master-table :initform (make-instance 'cid-master-table)
:reader repository-persistent-information/cid-master-table)
(root-mapper :initarg :root-mapper
:initform (error "Required initarg :root-mapper omitted.")
:reader repository-persistent-information/root-mapper)
(cid-mapper :initarg :cid-mapper
:initform (error "Required initarg :cid-mapper omitted.")
:reader repository-persistent-information/cid-mapper)
(local-mapper :initarg :local-mapper
:initform (error "Required initarg :local-mapper omitted.")
:reader repository-persistent-information/local-mapper)
(locally-named-roots :initarg :locally-named-roots
:initform (error "Required initarg :locally-named-roots omitted.")
:reader repository-persistent-information/locally-named-roots)
(anonymous-user :initarg :anonymous-user
:initform nil
:reader repository-persistent-information/anonymous-user))
(:default-initargs :node-id +object-id-of-root+) ;; force this to always be the root object.
(:documentation "Persistent information describing a repositiory, and stored in the repository")
(:metaclass persistent-standard-class)
(:schema-version 0))</pre><br />
The <code>repository-type</code> is just a keyword:<pre>(defconstant *repository-types* '(:basic :master :satellite :transport :extent :workspace)
"Type of repositories. Note that all but :EXTENT types of repositories
serve as root extents for databases which have multiple extents, and therefore imply extent.")</pre>The <code>parent-repository</code> and the<br />
<code>satellite-repositories</code> are for juggling multiple "satellite" repositories for holding particular subsets of changes (for, say, geographically distributing the servers for different product groups).<br />
<br />
The <code>canonical-class-dictionary</code> is an intern table for objects.<br />
<br />
The <code>cid-master-table</code> is (logically) the collection of audit-records. A CID (after <i>change id</i>) is represented as an integer index into the master table.<br />
<br />
The <code>root-mapper</code> is a mapping table from distributed identifiers to objects. <br />
<br />
The <code>cid-mapper</code> is a mapping table from the distributed identifier that represents the CID to the integer index of that CID in the master table. It is a subtable of the local mapper.<br />
<br />
The <code>local-mapper</code> is submapping of the <code>root-mapping</code>, but a supermapping of the <code>cid-mapper</code>.<br />
<br />
The <code>locally-named-roots</code>is a hash table for storing the root objects of the repository.<br />
<br />
Finally, there is the <code>anonymous-user</code> slot, which is the user id assigned for bootstrapping.<br />
<br />
And all this crap is in support of this procedure:<pre>(defun call-with-repository-transaction (&key repository
transaction-type
user-id-specifier
reason
;; generally, you only want to specify these two
meta-cid-set-specifier
cid-set-specifier
;; but if you are doing a comparison,
;; specify these as well
aux-meta-cid-set-specifier
aux-cid-set-specifier
receiver)
(check-type user-id-specifier (or keyword distributed-identifier))
(check-type transaction-type repository-transaction-type)
(check-type reason string)
<i>;; implementation omitted for brevity, ha ha</i>
)</pre><br />
Naturally we need to specify the <code>:repository</code>, the <code>:transaction-type</code> is one of <pre>(defconstant *repository-transaction-types* '(:read-only
:read-write
:read-cons
:read-only-compare
:read-cons-nonversioned
:read-only-nonversioned
:read-write-nonversioned))</pre>The <code>:user-id-specifier</code> should be a <code>distributed-identifier</code> of a <code>core-user</code> instance.<br />
<br />
The <code>:reason</code> is a human readable string describing the transaction. <br />
<br />
The <code>:meta-cid-set-specifier</code> is mumble, mumble... just a sec...<br />
<br />
The <code>:cid-set-specifier</code> is how you specify which CIDs will form the basis view for the transaction. We allow this to be a procedure that returns a <code>cid-set</code> object, and we will call this procedure as we are setting up the transaction and use the <code>:meta-cid-set-specifier</code> to specify the CIDs to form the versioned view the procedure will see. <br />
<br />
The <code>:meta-cid-set-specifier</code> can be the symbol <code>:latest-metaversion</code>, a timestamp, or a <code>cid-set</code>. <code>:latest-metaversion</code> means to use all CIDS while resolving the <code>:cid-set-specifier</code>, a timestamp is useful for rewinding the world, and the main use for using an explicit cid-set is for synchronizing views between master and satellite repositories.<br />
<br />
The <code>:receiver</code> is invoked within the dynamic extent of a transaction. It is passed a <code>core-txn</code> object that contains the metadata associated with the transaction.<br />
<br />
The ChangeSafe core components are the repository that holds changes and associated meta-information, and simple versioned CLOS objects. It is only useful as a foundation layer, though.<br />
<br />
<i>Next up, another level of abstraction...</i>Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-79150068237207950222013-09-10T16:36:00.001-07:002013-09-10T16:36:24.651-07:00Mix in a little CLOSThe obvious idea here is to make CLOS objects where the slots are implemented as versioned value objects. Then we override <code>slot-value-using-class</code>. You might consider this a stupid CLOS trick. You could just as well establish an abstraction layer through other means, but the point is to create an <i>understandable</i> abstraction model. It is easy to understand what is going to happen if we override <code>slot-value-using-class</code>.<br />
<br />
We use the MOP to create a new kind of slot so that we can compose values on the fly when the programmer calls <code>slot-value-using-class</code>. We also override <code>(setf slot-value-using-class)</code> so that it calls the "diff" computing code. Again, the point is to make it easy to understand what is happening.<br />
<br />
The end result is the <code>versioned-standard-object</code>. An instance of a <code>versioned-standard-object</code> (or any of it's inheritors, naturally), has all its slots implemented versioned value objects. The programmer should specify <code>versioned-standard-class</code> as the metaclass in the class definition.<br />
<br />
<pre>(defclass test-class ()
((nvi-slot :version-technique :nonlogged
:accessor test-class/nvi-slot)
(lnvi-slot :version-technique :logged
:accessor test-class/lnvi-slot)
(svi-slot :version-technique :scalar
:accessor test-class/svi-slot))
(:metaclass versioned-standard-class)
(:schema-version 0))</pre>In this example, the test class has some of the different kinds of versioned values that are named by the <em>version technique</em>. A <code>:nonlogged</code> slot is the "escape mechanism". It's a fancy name for "Just turn off the versioning, and use this here value."<br />
<br />
A <code>:logged</code> slot is less drastic. There's no versioning behavior, it's just a persistent slot, but we'll keep a list of the transactions that modified it.<br />
<br />
Finally, the <code>:scalar</code> version technique is one where the last chronologically participating change has the value.<br />
<br />
A versioned slot using the <code>:composite-sequence</code> uses a set of diffs to represent the versioned slot value, and these are composed as described in an earlier post.<br />
<pre>(defclass test-cvi-class ()
((cvi-slot-a :version-technique :composite-sequence
:accessor test-cvi-class/cvi-slot-a)
(cvi-slot-b :version-technique :composite-sequence)
(cvi-slot-c :version-technique :composite-sequence :initform nil))
(:metaclass versioned-standard-class)
(:schema-version 0))
</pre><br />
Once this is working, we have what we need to bootstrap the rest of the versioned storage.<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-21504148834938304792013-09-02T13:42:00.001-07:002013-09-02T13:42:20.452-07:00Putting things togetherOk, so we have audit records, a persistent store, and "diffs". Let's start putting them together. Naturally, we are going to keep the audit records in the persistent store, and we'll put the diffs in the audit records.<br />
<br />
A <i>versioned value</i> is the abstraction we're aiming for. We're going to create a versioned value by combining the information held in the audit records. If the information is a set of insertion and deletion records, we combine them as I described in the previous posts.<br />
<br />
What makes this interesting is that we can specify a subset of the audit records to participate in the construction of the value. We can extract the versioned value as it appeared at any point in time by specfying only those records that have a timestamp at or before that point. We can also synthesize interesting views by omitting some of the records.<br />
<br />
We're going to store a lot of these versioned values and we'll use many of them every time we access the store. To get any kind of coherent view of the world, we want to use a single set of audit records when we view these values. But programmers, being who they are, won't want to think about this. So here's what we'll do: we already have transactions in order to talk to the store; we'll add a field to the transaction that specifies the audit records to be used during that transaction. Pretty simple. You want to look at the world as it was on July 4th, you start the transaction with those audit records dated July 4th or earlier and use that set for every versioned value that you want to look at. It would be crazy to look at some objects as if it were July 4th, but others as if it were December. (Heh, but on occasion....)<br />
<br />
There is another reason we want to specify a set of audit records at the beginning of a transaction: we need to know the baseline that we compute our diffs against. When we do a read/write transaction we're going to modify some of our versioned values. When the transaction ends, we need to compute the diffs of the things we modified. We compute the diffs relative to the view we established at the beginning of the transaction.<br />
<br />
So we need to modify our audit records to record the <em>basis set</em> of records we use when we begin a transaction. We modify the transaction API to require the programmer to specify which basis set of records are to be used for the transaction and we use that basis set for computing the diffs at the end of read/write transactions.<br />
<br />
There is an interesting side effect of this change. Suppose we have some way of attaching a label to the transactions, and some transactions only use label 'A' and others only use label 'B'. Further transaction using label 'A' only see diffs relative to prior 'A' versions, while the 'B' transactions only see the 'B' diffs. The result is that a single versioned value can hold two completely different histories.<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-90438954317206714992013-08-16T12:11:00.000-07:002013-08-16T12:11:46.181-07:00Plus ça changeLet's start with a list:<br />
<pre>'(a b c)</pre>We change it:<br />
<pre>'(a b c d)</pre>We change the result:<br />
<pre>'(0 a b c d)</pre>etc.<pre>'(0 a b c)</pre>etc.<pre>'(a b c)</pre>and now a big change:<pre>'(-1 0 1 a a1 b b1 c d e f)</pre>At each step we compute the indels for that step. If we so desire, we can reconstruct any of the intermediate lists by starting at the beginning and applying the appropriate indels in order.<br />
<br />
But what if we skip some? We end up with a mess. Each set of indels is computed relative to the application<br />
of the prior indels. If an indel is omitted, the indices of the subsequent indels will be wrong.<br />
<br />
To solve this, we have to change the representation of our sequence (that is, we won't be modifying a list).<br />
Instead, we'll represent our sequence as an ordered set of list segments that we'll concatenate. Insertion is easy — just add a segment. Deletion is slightly harder because we don't want to cross segment boundaries. <br />
<br />
Reconstruction of an intermediate list requires more work, but we gain flexibility. We can apply a subset of the insertions and deletions and still get a meaningful result. For example, the first change we made was to create a list with the three elements '(a b c). The second change appended a 'd'. What if we apply the second change but omit the first? We append a 'd' to an empty sequence and get '(d).<br />
<br />
How about if we apply only change 3? That inserts a '0' at the beginning giving us '(0).<br />
<br />
If we apply change 2 and 3, still omitting 1, we get '(0 d).<br />
<br />
That last change is tricky. We've deleted the 'a and prepended '(-1 0 1 a a1), and deleted the 'c and inserted '(b1 c d e f). If we omit change 4 and 5 (which delete the leading 0 and trailing 'd) we'll get '(-1 0 1 a a1 0 b b1 c d e f d). We preserve the order between different insertion points, so the inserted 0 is always before the inserted 'd, but we resolve ambiguous insertions by placing the later insertions before the earlier ones if they are in the same place.<br />
<br />
Optional Exercise 7 (quite hard): Implement this as well.<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com1tag:blogger.com,1999:blog-8288194986820249216.post-63139932741596633832013-08-13T16:52:00.000-07:002013-08-13T16:52:26.870-07:00Let's start with a list:<br />
<pre>'(a b c d e f g h i j)</pre><br />
We'll add some elements:<br />
<pre>'(a b c d e f g h i j k l m n o)</pre><br />
Delete some:<br />
<pre>'(a b c d e i j k l m n o)</pre><br />
Add some more:<br />
<pre>'(x y z a b c d e i j k l m n o)</pre><br />
Maybe delete one:<br />
<pre>'(x y z b c d e i j k l m n o)</pre><br />
We want to compare the original sequence with the end sequence and<br />
summarize the difference with a set of "indels", which specify the<br />
indexes at where elements were inserted or deleted:<br />
<br />
<pre>(diff '(a b c d e f g h i j) '(x y z b c d e i j k l m n o))
(#S(INDEL
:LEFT-SUBSEQ-START 0 ;; 'a'
:LEFT-SUBSEQ-LIMIT 1
:RIGHT-SUBSEQ-START 0 ;; 'x y z'
:RIGHT-SUBSEQ-LIMIT 3)
#S(INDEL
:LEFT-SUBSEQ-START 5 ;; 'f g h'
:LEFT-SUBSEQ-LIMIT 8
:RIGHT-SUBSEQ-START 7 ;; nothing
:RIGHT-SUBSEQ-LIMIT 7)
#S(INDEL
:LEFT-SUBSEQ-START 10 ;; nothing
:LEFT-SUBSEQ-LIMIT 10
:RIGHT-SUBSEQ-START 9 ;; 'k l m n o'
:RIGHT-SUBSEQ-LIMIT 14))
</pre>Exercise 6 (hard): Implement this.<br />
<br />
For an extreme challenge, implement this in a way that is not hopelessly inefficient.<br />
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com3tag:blogger.com,1999:blog-8288194986820249216.post-61651992960514064602013-07-29T17:08:00.001-07:002013-07-29T17:08:24.645-07:00A bit harderThe last few exercises have been easy. These will be intermediate.<br />
<br />
So far, we've been using the integer serial numbers to refer to audit records when we don't have the record itself. The record can be read and written in a transaction, but outside the transaction we need a non-persistent object to act as a name. The problem with integers is that they aren't typed: whatever the number 42 means in one context is unlikely to be relevant in another. The second problem is that we are deriving the integers from an underlying implementation artefact. The number 42 just identifies an object to most of the code, but it derives from an index into a vector. If we change how the number is generated, then any numbers we already know about would have to change as well.<br />
<br />
We need an external naming scheme so we can refer to the audit records in a repository.<br />
<div>
<br /></div>
<div>
Exercise 5:</div>
<div>
<ol>
<li>Have a required "name" argument when creating a repository. The name should be immutable and you should be able to access the name without a transaction.</li>
<li>Define an identifier object.<br />
<pre>(defstruct (distributed-identifier
(:conc-name did/)
(:constructor %make-distributed-identifier (domain repository class numeric-id))
(:copier nil)
(:predicate distributed-identifier?))
"DIDs are interned, so you can use EQ to test for equality. However, you should
never modify the slots in a DID."
(domain "" :read-only t :type string)
(repository "" :read-only t :type string)
(class nil :read-only t :type (or null keyword))
(numeric-id 0 :read-only t :type non-negative-integer))
</pre>
You can skip the domain element for now, but the other fields are needed.<br />
</li>
<li>Define a printer for a distributed-identifier so they print like this: <span style="font-family: Courier New, Courier, monospace;">[test-repository.AUDIT-RECORD.22]</span> the name, class, and numeric id are printed with periods between them.</li>
<li>Define a parser that can read the fields out of a string.</li>
<li>Hook up the parser to the reader so that programmers can use the square bracket notation as a literal in a program.</li>
<li>Intern these objects in a weak hash table. We want to be able to use EQ to test these. So typing <span style="font-family: Courier New, Courier, monospace;">(eq [test-repository.AUDIT-RECORD.22] [test-repository.AUDIT-RECORD.22])</span> will return true. If two distributed-identifiers print the same way, then they refer to the same object and they should be EQ.</li>
<li>Add a mapping table to the repository for the audit records. You want it so that when an audit record is newly created it will have a new identifier. You'll want to decouple the assignment of the audit-record serial number from the assignment of the numeric ID in the distributed-identifier. This is so we can import audit-records from other repositories.<br /><br /><br />
Consider <span style="font-family: Courier New, Courier, monospace;">[test-repository.AUDIT-RECORD.22]</span>. It is the twenty-second audit record created by the test repository, but it is not necessarily at the twenty-second offset in the repository's table of audit-records. It could be, for example, at location 165. The mapping table gives the location. If a new audit-record is created, say at location 166, a new entry in the mapping table will map <span style="font-family: Courier New, Courier, monospace;">[test-repository.AUDIT-RECORD.23]</span> to location 166.<br /><br />The mapping table will be persistent, and thus require a transaction to read.</li>
</ol>
</div>
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-15267472503212722672013-07-28T11:02:00.000-07:002013-07-28T11:02:11.472-07:00Fairly easy The prior post had some easy exercises. This is easy, too.<br />
<br />
Every audit record in a repository has a unique serial number (relative to the repository). We want a way to represent sets of serial numbers.<br />
<br />
Exercise 4: Implement these:<br />
<br />
Constructors:<br />
<ul>
<li><span style="font-family: Courier New, Courier, monospace;">bitmap->serial-number-set <i>repository vector-1b</i></span></li>
<li><span style="font-family: Courier New, Courier, monospace;">range->serial-number-set <i>repository start end</i></span><br />Start is inclusive, end is exclusive.</li>
<li><span style="font-family: Courier New, Courier, monospace;">list->serial-number-set <i>repository list</i></span></li>
</ul>
Selectors and predicates:<br />
<div>
<ul>
<li><span style="font-family: 'Courier New', Courier, monospace;">serial-number-set? </span><i style="font-family: 'Courier New', Courier, monospace;">object</i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/repository <i>serial-number-set</i></span></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/empty? </span><i><span style="font-family: Courier New, Courier, monospace;">serial-number-set</span></i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/equal?<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">left-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set right-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/member? </span><i><span style="font-family: Courier New, Courier, monospace;">serial-number-set serial-number</span></i></li>
<li><span style="font-family: 'Courier New', Courier, monospace;">serial-number-set->bitmap </span><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i></li>
</ul>
</div>
<br />
<div>
These return new sets rather than mutate the old ones:</div>
<ul>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/adjoin<i> serial-number-set serial-number</i></span></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/remove<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set serial-number</i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/union<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">left-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set right-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/exclusive-or<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">left-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set right-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/intersection<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">left-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set right-</i><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i></li>
</ul>
<div>
Two more:</div>
<ul>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/intersaction?<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">left-serial-number-set right-serial-number-set</i><br />
Returns true if the intersection is not empty. Ought to be faster than <span style="font-family: Courier New, Courier, monospace;">(serial-number-set/empty? (serial-number/intersection left right))</span></li>
<li><span style="font-family: Courier New, Courier, monospace;">serial-number-set/last-serial-number<i> </i></span><i style="font-family: 'Courier New', Courier, monospace;">serial-number-set</i><br />
Returns the largest serial number in the set.<br />
<br />
</li>
</ul>
<div>
We'll use serial numbers and serial-number-sets so we can refer to audit-records and sets of audit-records without needing the records themselves. We can only get to the records within a transaction, but we can refer to them by serial number outside of a transaction.</div>
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com0tag:blogger.com,1999:blog-8288194986820249216.post-79692098032809947462013-07-26T13:24:00.000-07:002013-07-26T13:24:30.286-07:00Fun with Audit Trails!Gotta change direction here. It's getting boring. Let's implement an audit trail! How hard could it be? We'll make an audit trail out of audit records. Pick your favorite language and try this.<br />
<br />
Exercise 1: Implement an audit record. Make sure it has these features:<br />
<div>
<ul>
<li>a timestamp of when the record was created</li>
<li>a user-id to tell us who created the record</li>
<li>a reason why in the form of human readable text<br />
</li>
</ul>
<ul>
<li>immutable, including transitively reachable subobjects</li>
<li>Malformed records cannot be created. Automatically set the timestamp so it cannot be forged.</li>
</ul>
<div>
Now we have to keep them somewhere safe, and get them back on occasion.</div>
</div>
<div>
<br /></div>
<div>
Exercise 2a: Make a repository with some sort of stable storage for a backing store, like a file.</div>
<div>
<br /></div>
<div>
Exercise 2b: When creating an audit record, log it to stable storage. Make the repository be a required argument for creation of an audit record. Log the record in the repository, but don't store info about the repository itself in the record. The record doesn't have to know where it lives.</div>
<div>
<br /></div>
<div>
Exercise 2c: Have a way to load all the audit records back in from stable storage. Intern the audit records so re-loading is idempotent and eq-ness is preserved.</div>
<div>
<br /></div>
<div>
Exercise 3: Implement random access to a repository's audit log (indexed by integer)</div>
<div>
<br /></div>
<div>
These are all very easy. Don't worry, I'll make it harder.</div>
<div>
<br /></div>
Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com5tag:blogger.com,1999:blog-8288194986820249216.post-31062049977491309802013-07-22T19:39:00.000-07:002013-07-22T19:39:01.048-07:00Persistent vectorsSimulating mutation by creating a new object is reasonable for CLOS objects, but it is terrible idea for persistent vectors. The large amount of copying would be bad enough, but algorithms that rely on growable vectors (by calling <span style="font-family: Courier New, Courier, monospace;">vector-push</span>) will change from linear to quadratic in space. That's a disaster.<br />
<br />
We simulate mutation by replacing an entry in the object-map. This forces the granularity of mutation to be the same as the granularity of the object-map. In other words, we cannot update just a single item in a vector because there is no object-map entry referring to just that item.<br />
<br />
So why not fix that? The obvious solution is to allocate a contiguous set of object-ids. Easy enough, but if we want to "grow" the vector we'll have a problem. The vector storage itself can easily be moved, but the range of object-ids that map to the vector cannot easily be extended.<br />
<br />
My solution was to change how object-ids and the object-map work. The object-map maps integers to persistent objects. Conceptually, the object-map itself is a vector. If we add an orthogonal index to object-map it becomes a 2-dimensional array. Now we can place vectors in the map and assign them a single primary index and map the secondary indices on to the vector elements. Vectors can grow (or shrink) without changing their primary index.<br />
<br />
Now that we have persistent vectors that we can mutate, we can create mutable persistent cons cells (just a 2-element vector) and mutable persistent hash tables.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com4tag:blogger.com,1999:blog-8288194986820249216.post-65575472938017842612013-07-21T06:31:00.000-07:002013-07-21T06:31:30.106-07:00Faking mutability<a href="http://www.blogger.com/profile/04898791309856994697">Dan Lentz</a> said<br />
<blockquote>
I do think the term immutability is being thrown around a bit here somewhat in denial of the reality that what we are in the very process of doing is mutating the instance. Is immutability even the goal? I mean, yes the wbtree is persistent in the sense that older versions of an object are never overwritten, but that doesn't preclude us from representing changes to an object in subsequent updates to the tree and expressing its value at a given point in time as MVCC. Any given version of the object is immutable, but the object itself can model mutability without detracting from that.</blockquote>
<br />
Exactly what I was going to write about next!<br />
<br />
When we create a persistent object, we assign it a unique object id. We cannot change the object, but we <em>can</em> change mapping from ids to objects. The procedure <span style="font-family: Courier New, Courier, monospace;">remake-instance</span> does this. <span style="font-family: Courier New, Courier, monospace;">remake-instance</span> takes an persistent object and some initargs and creates a brand new object, but it updates the object-map with the old object id. We simulate slot mutation by creating a new object that differs only in that slot value and giving it the old object id.<br />
<br />
This simplifies transaction handling quite a bit. If a transaction aborts we want to restore the world to the state it was in when we started. Since we didn't actually change the original object, all we need to do is go back to using the old object map.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com1tag:blogger.com,1999:blog-8288194986820249216.post-64843956270641607432013-07-20T08:03:00.000-07:002013-07-20T08:03:03.192-07:00You could use a monad.<div class="tr_bq">On the <a href="http://lists.racket-lang.org/users" target="_blank">Racket Users list</a> <a href="https://plus.google.com/u/0/111317890246898064306" target="_blank">Ben Duan</a> asked</div><blockquote class="tr_bq"><br />
Scenario: A piece of data is determined in the first function `f1', but is only processed in a sub-sub-sub-… function `fx'.<br />
<br />
One way is to use pass `the-data' as arguments from `f1' through `f2' all the way down to `fx':<br />
<pre>(define f1 (the-data …)
…
(f2 the-data …)
…)
(define f2 (the-data …)
…
(f3 the-data …)
…)
…
(define fx (the-data …)
… the-data …)
</pre>But in the above way, the body of `f2', `f3', `f4' and so on doesn't use `the-data'. It is only passed to the next function. And I still have to add the argument `the-data'.<br />
<br />
Are there other ways to solve this problem?</blockquote>You could use a monad. <br />
<br />
The basic gist of it is that instead of manipulating <span style="font-family: Courier New, Courier, monospace;">the-data</span>, we manipulate functions. So the inner functions f2, f3, ... will change from this:<br />
<pre>(define (f2 the-data arg arg …)
…
(f3 the-data x y …))
</pre>to this:<br />
<pre>(define (f2 arg arg …)
(lambda (the-data)
…
((f3 x y …) the-data)))
</pre>We just move <span style="font-family: Courier New, Courier, monospace;">the-data</span> to the end of the argument list and curry the functions. That makes things more complicated at first, but the inner functions that don't actually use the-data can be eta-reduced. <span style="font-family: Courier New, Courier, monospace;">(lambda (x) (f x)) => f</span><br />
<br />
So f2 eta-reduces to:<br />
<pre>(define (f2 arg arg …)
…
(f3 x y …))
</pre>and all mentions of <span style="font-family: Courier New, Courier, monospace;">the-data</span> disappear in the inner functions (pretty slick!) The innermost fx can't be reduced this way, of course, and the callers of f0 have to change to pass the initial value of <span style="font-family: Courier New, Courier, monospace;">the-data</span>.<br />
<br />
I just did this with an ad-hoc code transformation. A "monad" formalizes this. (and I skipped over a<i> lot</i> of detail.)<br />
<br />
<a href="http://www.ccs.neu.edu/home/matthias/" target="_blank">Matthias Felleisen</a> elaborated further:<br />
<blockquote class="tr_bq">Here is what Joe is saying, with fake macros; the first two parts are real:</blockquote><blockquote><pre>#lang racket
;; -----------------------------------------------------------------------------
;; an example using lexical scope
(define (f1-lexical-scope x)
(define the-data (sin x))
(define (f2 y)
`(f2 ,(f3 y)))
(define (f3 z)
`(f3 ,(f4 z)))
(define (f4 w)
`(f4 ,the-data ,w))
(f2 10))
(f1-lexical-scope pi)
;; -----------------------------------------------------------------------------
;; the same example with the 'monad' spelled out
(define (f1-monad x)
(define the-data (sin x)) ;;
((f2 10) the-data))
(define ((f2 y) the-data)
`(f2 ,((f3 y) the-data)))
(define ((f3 z) the-data)
`(f3 ,((f4 z) the-data)))
(define ((f4 w) the-data)
`(f4 ,the-data ,w))
(f1-monad pi)
;; -----------------------------------------------------------------------------
;; a sketch of how syntax would hide the monad where needed
;; the following macros are fake, because I don't have time to write them out:
;; see the HtDP language macros for #%app, which register functions too
;; defines the-data and initializes it
(define-syntax-rule (create-store x) (define the-data x))
;; registers f as a store-passer
(define-syntax-rule (define-store-passer (f x) e) (define ((f x) the-data) e))
;; this supplements #%app so that when a registered store-passer f is applied,
;; it picks up the-data in a curried application; other functions work normally
(define-syntax-rule (apply-store-passer f x ...) (old-apply (f x ...) the-data))
;; pick up the-data from secret stash
(define-syntax-rule (access-store) 42)
;; if you had these macros, the above would read like this:
(define (f1-monad.v2 x)
(create-store (sin x)) ;;
(f2 10))
(define-store-passer (f2.v2 y)
`(f2 ,(f3 y)))
(define-store-passer (f3.v2 z)
`(f3 ,(f4 z)))
(define (f4.v2 w)
`(f4 ,(access-store) ,w))
(f1-monad.v2 pi)</pre></blockquote>Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com2tag:blogger.com,1999:blog-8288194986820249216.post-59323571780319325512013-07-18T17:15:00.000-07:002013-07-19T05:33:59.571-07:00What about...<a href="http://www.blogger.com/profile/11452247999156925669">John Cowan</a> said:<br />
<blockquote>The arguments passed to `<span style="font-family: Courier New, Courier, monospace;">make-instance</span>` are the sum total of the information needed to create the instance in its initial state.<br />
<br />
Since you require that instances be transitively immutable, the initial state is the only state.<br />
<br />
If you have those arguments squirreled away, then you can create another instance in the exact same initial state.<br />
∎</blockquote>That's the basic argument. It's demonstrably false.<br />
<pre>(setq foo (make-instance 'test-class))
(test-class/name foo)
ZIPPY</pre>So the game begins. I point out a trivial technicality, you adjust for it.<br />
<blockquote>The arguments passed to `<span style="font-family: Courier New, Courier, monospace;">make-instance</span>` <b>combined with the appropriate</b><br />
<b>defaults from the initargs</b> are the sum total of the information needed<br />
to create the instance in its initial state.</blockquote><br />
Well, that's wrong, too. As <a href="http://www.blogger.com/profile/04898791309856994697">Dan Lentz</a> pointed out, "What about any stateful behavior encoded into the objects class?"<br />
<br />
I'm pretty sure you ultimately win. But by the time I run out of objections your argument is going to be a patchwork quilt of exceptions, special cases, "implementation artifacts", "things that might be <i>technically</i> wrong, but can never make a <i>real</i> difference", <i>etc. etc.</i><br />
<br />
This doesn't inspire much confidence in our proof.<br />
<br />
<a href="http://www.blogger.com/profile/04898791309856994697">Mr. Lentz</a> also asked:<br />
<i>Couldn't it just have trapped the out of date schema version in <span style="font-family: Courier New, Courier, monospace;">shared-initialize</span> and dispatched it off to <span style="font-family: Courier New, Courier, monospace;">update-instance-for-redefined-class</span>?</i><br />
<br />
I don't know. When we bring an object from the store, it isn't really an instance yet.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com6tag:blogger.com,1999:blog-8288194986820249216.post-86743692928143702432013-07-18T08:23:00.000-07:002013-07-19T06:56:14.474-07:00Persisting CLOS objectsIn a previous post, I described how a programmer would save simple primitive objects to the persistent store. How does a programmer save a CLOS object? Very simply. Here is a class definition:<br />
<pre>(defclass test-class ()
((name :initarg :name
:initform 'zippy
:reader test-class/name)))
</pre>Here is the persistent version:<br />
<pre>(defclass test-class ()
((name :initarg :name
:initform 'zippy
:reader test-class/name))
(:metaclass persistent-standard-class)
(:schema-version 0))
</pre>And here is how you save a persistent instance to the store:<br />
<pre>(make-instance 'test-class)</pre>I'll do another one:<br />
<pre>(make-instance 'test-class :name 'griffy)</pre>Not too shabby.<br />
<br />
The point is that we can abstract away an awful lot of the persistence layer. This is really important because the versioning layer is at least as complex. Wrapping your mind around multiple versioned instances takes practice. It's a good thing that we don't have to think worry about the persistent layer at the same time.<br />
<br />
But I said that I'd describe how it works. I have several attempts at description sitting here on my computer, and they are hard to read, hard to undertand, and it simply doesn't seem like it would work correctly. I've tried to logically argue that it does work, and certainly the fact that the code was working is empirical evidence, but I'm still trying to find a clear description so that it simply makes sense that it ought to work. So rather than describe <em>why</em> it ought to work, let me describe what happens beneath the covers.<br />
<br />
The code in <a href="https://code.google.com/p/jrm-code-project/source/browse/trunk/ChangeSafe/pstore/pclass.lsp">pstore/pclass.lsp</a> has the implementation. The CLOS meta-object protocol allows you to customize the behavior of the object system by adding your own methods to the internal CLOS implementation. To create a CLOS object, you call <span style="font-family: Courier New, Courier, monospace;">make-instance</span>. Magic happens, but part of that magic involves initializing the slots of the newly created object. At this point during the object instantiation magic CLOS calls the generic function <span style="font-family: Courier New, Courier, monospace;">shared-initialize</span>. <span style="font-family: Courier New, Courier, monospace;">shared-initialize</span> is responsible for assigning values to the slots of an object and it get called on the uninitialized object, the set of slot names to fill, and an argument list. The argument list is normally the same argument list given to <span style="font-family: Courier New, Courier, monospace;">make-class</span>. The default behavior of <span style="font-family: Courier New, Courier, monospace;">shared-initialize</span> is to match up the keyword-specified initargs with the appropriate slots and stuff the values in. But we'll modify that.<br />
<pre>(defmethod clos:shared-initialize ((instance persistent-standard-object) slot-names
&rest initargs
&key persistent-store node-id node-index
&allow-other-keys)
(if (eq instance *restoring-instance*)
(call-next-method)
;; If we are being called from elsewhere,
;; we have to wrap the initargs and initforms
;; in persistent-objects and create an initializer
;; for this object.
(let* ((class (class-of instance))
(init-plist (compute-persistent-slot-initargs class
(or persistent-store *default-persistent-store*)
initargs))
(node-id (persistent-object/save
(make-initializer class
(class-schema-version class)
init-plist)
(or persistent-store *default-persistent-store*)
node-id)))
(apply #'call-next-method instance slot-names (nconc init-plist initargs))
(setf (persistent-standard-object/node-id instance) node-id)
(setf (persistent-standard-object/node-index instance) node-index)
(setf (object-map-info/%cached-value
(persistent-object/find-object-map-info
(or persistent-store *default-persistent-store*) node-id))
instance)
instance)))
</pre>First, we check if the instance we are initializing is being restored from the persistent store. When we first open a persistent store and re-instantiate the objects, we do not want the act of re-instatiation to cause the objects to be re-persisted. So in that case we simply invoke <span style="font-family: Courier New, Courier, monospace;">call-next-method</span> and let the default actions take place.<br />
<br />
But if we are creating a new object, we want it to persist. The call to <span style="font-family: Courier New, Courier, monospace;">persistent-object/save</span> does the trick, but notice that we don't pass in the instance. We call <span style="font-family: Courier New, Courier, monospace;">make-initializer</span> on the argument list and we save that instead.<br />
<br />
An initializer is a simple structure that holds the class, a "schema-version", and the argument list:<br />
<pre>(defstruct (initializer
(:conc-name initializer/)
(:constructor make-initializer (class schema-version init-plist))
(:copier nil)
(:predicate initializer?))
(class nil :read-only t :type persistent-standard-class)
(schema-version 0 :read-only t :type non-negative-fixnum)
(init-plist '() :read-only t :type list))</pre>and <span style="font-family: Courier New, Courier, monospace;">persistent-object/save</span> serializes it like this:<br />
<pre>(:method ((object initializer) stream symbol-table)
(write-byte serialization-code/initializer stream)
(write-fixnum (symbol-table/intern-symbol symbol-table (class-name (initializer/class object))) stream)
(write-fixnum (initializer/schema-version object) stream)
(write-fixnum (length (initializer/init-plist object)) stream)
(iterate (((key value) (scan-plist (initializer/init-plist object))))
(write-fixnum (symbol-table/intern-symbol symbol-table key) stream)
(serialize value stream symbol-table)))
</pre><i><span style="font-size: x-small;">(I'm skipping over an important detail, but I'll get to it...)</span></i><br />
<br />
Something unusual is going on here. The persistent object itself is not placed in the store. The argument list passed to <span style="font-family: Courier New, Courier, monospace;">make-instance</span> is stored instead. Because the persistent object is immutable, all the information needed to reconstruct the object is present in the initargs, so we don't need the resulting object.<br />
<br />
Why would we do this? The object itself has structure. Instantiating the object imposes this structure on the values stored within. The structure of the objects in the store are collectively known as the <i>schema</i>. Persistent stores are intended to hold objects for a long time. We expect the code that manipulates the objects to change over time, and it is likely that we will want to change the object representation on occasion. When we change the object representation, we need to consider the legacy objects that were constructed under the old representation. This is called <i>schema evolution</i> and it is one of the most painful tasks in maintaining an object-oriented database. At its worst, the persistent schema is so different from the code schema that you have only one way to handle the schema change: dump the entire database into a neutral format (like a file full of strings!), create a new, empty database and read it all back in. My experience with other object oriented database is that the worst case is the common case.<br />
<br />
If we store only the information needed to reconstruct the object, we no longer need to worry about the object layout. This finesses the problem of schema evolution.<br />
<br />
But there is a <span style="font-family: Courier New, Courier, monospace;">:schema-version</span> specified in the class definition, and that is most definitely stored. There are two kinds of information in the initargs: the values themselves are obvious, but the <em>interpretation</em> of the values is not. An example should illustrate this.<br />
<br />
Suppose we start out a project where we are going to save named objects in the store. At some point in the code we invoke <span style="font-family: Courier New, Courier, monospace;">(make-instance 'foo :name "Joe")</span> and so there is an initializer in the store something like <span style="font-family: Courier New, Courier, monospace;">[foo (:name "Joe")]</span>.<br />
<br />
Now suppose that we extend our application. We are going to store family names as well. So we start storing initializers with more data: <span style="font-family: Courier New, Courier, monospace;">[foo (:name "John" :family "Smith")]</span> What do we do about the legacy <span style="font-family: Courier New, Courier, monospace;">[foo (:name "Joe")]</span>? Let us suppose we decided that we'll just default the missing last name to "Unknown". Everything is cool. Old and new objects live together.<br />
<br />
But now we want to extend our application to handle people like Cher and Madonna. We want it to be the case that we can deliberately omit the family name for some people. The initializers will look like <span style="font-family: Courier New, Courier, monospace;">[foo (:name "Cher")]</span>. But now we have an ambiguity. We don't know if the family name is omitted on purpose, or whether the object was stored before the family name became important. Do we default the last name to "Unknown" or not?<br />
<br />
The <span style="font-family: Courier New, Courier, monospace;">:schema-version</span> argument in the class definition is used to disambiguate these cases. When the objects are recovered from the store, the constructor can use this value to decide how to interpret the remainder of the initargs.<br />
<br />
Admittedly, this is a bit klunky. But it doesn't complicate things too much. Programmers will have to do two things when changing a persistent class definition: bump the <span style="font-family: Courier New, Courier, monospace;">:schema-version</span>, and decide how to reconstruct objects that were stored under the legacy expectations. (Actually, you can punt on these if you can prove that no ambiguous cases will arise.)<br />
<br />
Now about that important detail. The initializers we store aren't exactly what we said. Instead, when the persistent class is defined a set of "hidden slots" is created in parallel with the declared slots. The initargs of the hidden slots are not persistent objects, but the persistent object ids of the initargs. We don't store <span style="font-family: Courier New, Courier, monospace;">[foo (:name "Joe")]</span>, we store<span style="font-family: Courier New, Courier, monospace;"> [foo (:persistent-initarg-for-name 33)]</span> where 33 is the persistent object id of the persistent string <span style="font-family: Courier New, Courier, monospace;">"Joe"</span>. I could write a few pages explaining why, but it would be deadly boring. I'm sure you can imagine uses for an extra hidden level of indirection (think multi-value concurrency). (By the way, notice call to <span style="font-family: Courier New, Courier, monospace;">(apply #'call-next-method ...) </span><span style="font-family: inherit;"> uses </span><span style="font-family: Courier New, Courier, monospace;">nconc</span><span style="font-family: inherit;"> to paste the hidden arguments on the front of the argument list like I mentioned in the previous post.)</span><br />
<br />
Does it work? Mostly. If you look at the code in <a href="https://code.google.com/p/jrm-code-project/source/browse/trunk/ChangeSafe/conman/workspace.lsp">conman/workspace.lsp</a> you'll find a class with a schema-version of 1 and this method:<br />
<pre>(defmethod pstore::restore-instance ((class (eql (find-class 'workspace))) (schema (eql 0))
persistent-store node-id node-index init-plist)
(debug-message 2 "Upgrading schema for workspace.")
;; This needs work. The zeros are the OID of NIL.
(pstore::restore-instance class 1 persistent-store node-id node-index
(list* :added-master-csets 0
:removed-master-csets 0
:transitional-added-master-csets 0
:transitional-removed-master-csets 0
init-plist)))</pre>I added four slots to workspace objects. When resoring a workspace from the store, if it was a workspace created before these slots existed, this method overrides the usual restore method. It simply adds the new slots to the front of the init-plist before proceeding with the normal <span style="font-family: Courier New, Courier, monospace;">restore-instance</span>. (The use of the number 0 instead of <span style="font-family: Courier New, Courier, monospace;">NIL</span> is an implementation defect that I'm too lazy to fix at the moment.)<br />
<hr />The problem in explaining this? I don't know an easy <em>proof</em> that storing initializers rather than objects is sufficient in all cases. It's not obvious that this even helps with schema evolution, and it took me a while before I was persuaded that there aren't lurking edge cases. In personal discussions, it takes a while to persuade people that this is in fact a solution to a problem. I'd love to hear a better argument.Joe Marshallhttps://plus.google.com/113159534857760391182noreply@blogger.com8