Machine Learning for Hackers, Chapter 7

The title of this chapter, “Optimization: Breaking Codes”, sounds more exciting than it really is. The code breaking is attempted via an optimization algorithm called the Metropolis method. The authors show you how to encrypt text using a simple Caesar cipher and then how to implement the Metropolis method to attempt to decipher the message. Their test message is “here is some sample text”. They encrypt it using the Caesar cipher, so it looks like “ifsf jt tpnf tbnqmf ufyu “. The encryption rule is simply replacing every letter with the following letter in the alphabet. The letter a is replaced with b, b with c, and so on until you replace z with a. They manage to decipher their test text through the course of 50,000 Metropolis steps (on the 45,609th attempt). I suppose this is a fun demonstration of optimization. It just seemed a little outdated. I guess I was expecting to see them crack an actual message from World War II or something.

They start the chapter by demonstrating how to use the R optim function. For some reason this function has always confused me. Probably because I don’t use it that often. But I have to say I feel much better about it now. They give clear demonstrations of its use and visually show you how it works. Their example involves the height and weight dataset from chapter 2. First we do linear regression and find coefficients for the slope and intercept. Then we find the same coefficients using optim. The function minimized in the call to optim is a hand-built sum of squared errors function. I thought this was a great way to demonstrate how optim works. But after this, they do something puzzling: they continue to demonstrate how to use optim with Ridge Regression. Now Ridge Regression is a type of regression that helps remedy multicollinearity. If you have predictor variables that are highly correlated, you may get some weird coefficients in your linear modelling results. I’m talking about getting a negative value where you would expect a positive value, that sort of thing. Ridge regression helps address this by “modifying the method of least squares to allow biased estimators of the regression coefficients” (Kutner, et al., p. 432). Knowing that, it doesn’t make much sense to demonstrate Ridge regression on height and weight data. There’s only one predictor. I thought maybe they were going to do more with Ridge regression in this chapter, but they don’t. They move on to code breaking. It just seemed out of place to bring up Ridge regression in the context of single predictor regression.

I won’t go into the code breaking program. The code is available on the book’s git hub page. What I wanted to review for my own benefit is their function for encrypting text using the Caesar cipher. I thought it was a good example of using R’s indexing capabilities with list objects. First we need a vector with the English alphabet and two empty list objects for the ciphers: one for encrypting text and another for deciphering text:

english.letters <- letters
caesar.cipher <- list()
inverse.caesar.cipher <- list()


Now we create the ciphers:

# need to use index %% 26 + 1 to handle letter "z" at end
# 26 %% 26 + 1 = 0 + 1 = 1
for (index in 1:length(english.letters))
 caesar.cipher[[english.letters[index]]] <- english.letters[index %% 26 + 1]
 inverse.caesar.cipher[[english.letters[index %% 26 + 1]]] <- english.letters[index]


The "index %% 26 + 1" confused me for a couple of reasons. First off, the %% operator means modulus and I'm used to see something like 12 %% 5, where the first number is bigger than the second. The answer to that is of course 2. If you divide 12 by 5 you get remainder 2, and that's what the %% operator returns. But in the code above, you'll notice the "index" value is less than 26 on every iteration except the last. How do you calculate something like 2 %% 26? I wasn't sure so I had to google around and figure it out. It turns out the modulus operation a %% n is defined as follows:

a \, mod \, n = a - \lfloor \frac{a}{n} \rfloor \times n

So, 2 %% 26 is calculated like so:

2 \, mod \, 26 = 2 - \lfloor \frac{2}{26} \rfloor \times 26 = 2 - \lfloor 0.0769 \rfloor \times 26 = 2 - (0 \times 26) = 2

Once I was clear on that, I was confused why we needed to use the %% operator at all. Again if you look at the code above, you'll see "index %% 26 + 1". When index = 2, "index %% 26" gives 2. When index = 3, "index %% 26" gives 3. And so on. Why not just use "index + 1"? Well, when we get to the letter "z", we need to substitute "a". The index number for "a" is 1. But if we do "index + 1", then we would get 27 at the letter "z" since its index number is 26. That would not index anything in the 26 element "english.letters" vector and produce an error. Therefore we need to use the %% operator to allows us to recycle through the numbers 1 - 26. When index = 26 (i.e., when we get to the letter "z"), we do 26 %% 26 + 1, which equals 1 since 26/26 has remainder equal 0.

Now that I understand how the ciphers are generated, I can print them and have a look if I like:

print(caesar.cipher) # encrypt text
print(inverse.caesar.cipher) # decipher text


Since these are lists, we investigate with indices using double brackets. For example, to see what we substitute for letter "h", we do the following:



That returns "i". It seems like a fun extension to this would be to create a general function that allowed you to specify the substitution index. In the Caesar cipher, this is 1. It would be neat to specify, say, 13 so "a" becomes "n", "b" becomes "o", and so on. Anyway, moving on...

Now we use the ciphers we just made to encrypt text. To do this the authors build two functions. The first encrypts text letter-by-letter using the cipher you give it. The second function encrypts words using the first function. Let's have a look at these: <- function(string, cipher)
 output <- ''
 for (i in 1:nchar(string))
 output <- paste(output, cipher[[substr(string, i, i)]], sep='')
} <- function(text, cipher)
 output <- c()
 for (string in text)
 output <- c(output,, cipher))


The first function loops through each letter of a word (or "string"). Using the substr function, we extract each letter, use that letter to look up the substitution in the cipher, and "paste" the substituted letter to the previously substituted letter. At the conclusion of the loop we have encrypted the word using our cipher. The second function uses this first function to encrypt whole phrases. Here is how the authors encrypt the message "here is some sample text" using the Caesar cipher:'here', 'is', 'some', 'sample','text'), caesar.cipher)


That gives us "ifsf" "jt" "tpnf" "tbnqmf" "ufyu" . If we wanted to decipher the text, we would use the inverse cipher as follows:'ifsf', 'jt', 'tpnf', 'tbnqmf', 'ufyu'), inverse.caesar.cipher)


Again it would be a good exercise to generalize these functions to accept a whole text file and return an encrypted text file. Not that I'm going to do that, but it sounds like it would make a good project assignment in some programming class.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.