lab 3

lab 3 - vigenere and frequency analysis (feb 14)

so that you do not have to code everything yourself, i have written some helper code and auxiliary data in this file codes.py. you can either cut and paste the necessary code into your program window/file, or load the whole module with the command import codes (after saving the file in your working directory). if you do import my code, functions can be called by codes.functionname(inputs). for example, if you want to use the slice function, the usage would be codes.slice('abcdefg', 3). alternatively, if you type from codes import slice you can call the function without the codes. prefix. here is a summary of the functions included in my code--not all of them are necessarily help for this lab:

shift(s,k) - the caesar shift function as in lab 2
onlyalpha(s) - strips nonalphabetic characters from a string s
slice(s,n) - slices a string s into n substrings---useful for vigenere and columnar transposition ciphers
freq_count(s) - performs a frequency count on the letters in s
freq_dist(s) - computes the a frequency distribution of the letters in s
digrams(s) - count repeated digrams in a string s
trigrams(s) - count repeated trigrams in a string s
subst(C,sub) - makes substitutions in a ciphertext to help you break substitution ciphers (may be useful for homework 5)

here are also some pre-defined objects for you:

letter - letter[i] is the i-th lower-case letter (i between 0 and 25)
Letter - letter[i] is the i-th upper-case letter (i between 0 and 25)
EF - a list of relative frequencies of the English letters
C1 - ciphertext encoded with caesar shift
C2 - ciphertext encoded with vigener cipher
C32 - substitution ciphertext from homework 5

task 1: write a function splice(l), where l is a list of strings. this should be the inverse function of slice. namely, if you write each string vertically as a column, splice should return the string you get from reading through the rows. test this to make sure this gives you the inverse of the slice function, e.g., splice(slice('abcdefgh', 3)) should result in 'abcdefgh'.

task 2: write a function vigenere(s,key, decode=False) that performs a vigenere encryption (or decryption if decode is set to True) on the string s with key key. (here key should also be a string, e.g., 'vigenere'.) test out encoding and decoding with a couple of different messages and keys.

task 3: write a function MSE(F1,F2) which given a pair of frequency distributions F1 and F2, returns the mean square error between them, i.e., the sum of the squares of the differences F2[i]-F1[i] over all list positions. you may assume F1 and F2 are lists of numbers of the same length, but do not assume their length is 26.

task 4: write a function test_shifts(s) that takes in a string s, computes the frequency distribution, and calculates the mean square error of all cyclic shifts of the frequency distribution with the standard English frequency distribution (stored in the list EF). this function should return a list of tuples (k,err), where k is the shift amount (between 0 and 25) and err is the corresponding MSE for that shift. return the list sorted by MSE in ascending order (smaller MSE's first). you can make a sorted list of tuples, sorted by the i-th tuple entry, in python with the function sorted(yourlist, key=lambda item: item[i]) where yourlist is the name of your list. this returns a new sorted list. here item is just a dummy name representing an item in the list. see also the python 3 sorting how-to for more information. test this function on the loremipsum string.

task 5: write a function breakcaesar(s), which outputs (prints or returns, your choice) all possible shift keys and associated Caesar shifts of the string s, which are ordered from mostly likely English plaintext to least likely by making use of your test_shifts function. test this function on the included ciphtertext C1, which was encrypted with a Caesar cipher.

task 6: write a function kasiski(s,n), which implements the kasiski test as follows. for each substring t of s of length n, count how many times it appears in s. make and return a list repeated of tuples (t,diff) where diff is the difference in relative positions of 2 distinct occurrences of t. test your function on the string s='abcdeabcdabc' for n=3,4,5. in case it helps, you can access the substring of s of length n starting at the i-th position with s[i:i+n], and you can test for equality of two strings with s1 == s2.

task 7: using a combination of the above, determine the key length, key and plaintext for the included ciphertext C2, which was encoded using a vigenere cipher. if you like, you can call from math import gcd to use python's gcd function.

lab 3 homework: complete the above tasks (due fri feb 21)

labs page
course home