\n", "> Athens University of Economics and Business" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The Schulze Method\n", "\n", "To start with the Schulze method, we need a way to input ballots.\n", "\n", "We assume that the ballots are saved in a file, one ballot per line. In each line, that is, ballot, the candidates are listed in decreasing preference.\n", "\n", "We'll use the file [ballots.csv](ballots.csv) as an example. The file is in [Comma-separated Values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) format.\n", "\n", "So, the first line is:\n", "```\n", "D,B,A,C\n", "```\n", "which means that the first preference of the voter is candidate D, then B, then A, then C." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Although seemingly simple, CSV is a treacherous format.\n", "\n", "There are many details than one would think at first sight; for example, what happens if a field in the line contains a comma, could we have different delimiters, etc.\n", "\n", "For that reason, you should always use a ready-made library for handling CVS files.\n", "\n", "Our ballots file is simple, but there is no reason not to use [Python's CSV library](https://docs.python.org/3/library/csv.html) anyway.\n", "\n", "We'll get all the ballots and we'll put them into a list." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['D', 'B', 'A', 'C'],\n", " ['A', 'C', 'D', 'B'],\n", " ['C', 'D', 'B', 'A'],\n", " ['C', 'D', 'B', 'A'],\n", " ['D', 'C', 'B', 'A'],\n", " ['D', 'B', 'A', 'C'],\n", " ['B', 'A', 'D', 'C'],\n", " ['D', 'C', 'B', 'A'],\n", " ['A', 'C', 'D', 'B'],\n", " ['B', 'A', 'D', 'C'],\n", " ['D', 'C', 'B', 'A'],\n", " ['D', 'B', 'A', 'C'],\n", " ['A', 'C', 'D', 'B'],\n", " ['B', 'A', 'D', 'C'],\n", " ['D', 'C', 'B', 'A'],\n", " ['D', 'B', 'A', 'C'],\n", " ['A', 'C', 'D', 'B'],\n", " ['B', 'A', 'D', 'C'],\n", " ['A', 'C', 'D', 'B'],\n", " ['C', 'D', 'B', 'A'],\n", " ['A', 'C', 'D', 'B']]\n" ] } ], "source": [ "import csv\n", "import pprint\n", "\n", "\n", "with open('ballots.csv') as ballots_file:\n", " reader = csv.reader(ballots_file)\n", " ballots = list(reader)\n", " \n", "pprint.pprint(ballots, width=30)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The first step in the Schulze method is to calculate the pairwise preferences of the voters regarding the candidates. \n", "\n", "That is an array $P$, such that element $P[c_j, c_k]$ shows how many voters prefer candidate $c_j$ to candidate $c_k$.\n", "\n", "As our candidates are given by characters, we'll assign a number, starting from zero, to each of the candidates, so that we'll be able to use integer-based indices." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0, 6, 14, 10],\n", " [15, 0, 8, 4],\n", " [7, 13, 0, 9],\n", " [11, 17, 12, 0]]\n" ] } ], "source": [ "from collections import defaultdict\n", "\n", "candidates = {\n", " 'A': 0,\n", " 'B': 1,\n", " 'C': 2,\n", " 'D': 3\n", "}\n", "\n", "def calc_pairwise_prefs(ballots, candidates):\n", " # Initialize p to 0.\n", " p = [ [0 for j in candidates.keys() ] for i in candidates.keys() ]\n", " # Take each ballot in turn.\n", " for ballot in ballots:\n", " # Take each candidate in the ballot.\n", " for i, c_i in enumerate(ballot):\n", " # Take all following candidates in the ballot.\n", " for c_j in ballot[i+1:]:\n", " # Add to the score of c_i vs c_j.\n", " p[candidates[c_i]][candidates[c_j]] += 1\n", " return p\n", "\n", "p = calc_pairwise_prefs(ballots, candidates)\n", "pprint.pprint(p, width=20)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The second step in the Schulze method is to create an election graph.\n", "\n", "This will be represented by an adjacency matrix.\n", "\n", "If for two candidates $c_i$ and $c_j$ the number $P[c_i, c_j]$ of voters that prefer $c_i$ over $c_j$ is greater than the number of voters $P[c_j, c_i]$ that prefer $c_j$ over $c_i$, we add the link $c_i \\rightarrow c_j$ and we assign the number $P[c_i, c_j] - P[c_j, c_i]$ as the weight of the link $c_i \\rightarrow c_j$.\n", "\n", "We'll assign the value $-1$ for all other pairs (or $-\\infty$, but as $-1$ is not a valid weight, it will also do)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def create_election_graph(p):\n", " n = len(p)\n", " g = [ [-1 for j in range(n) ] for i in range(n) ]\n", " for i in range(n):\n", " for j in range(n):\n", " if p[i][j] > p[j][i]:\n", " g[i][j] = p[i][j] - p[j][i]\n", " return g" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can then see the adjacency matrix for our election example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1, -1, 7, -1],\n", " [9, -1, -1, -1],\n", " [-1, 5, -1, -1],\n", " [1, 13, 3, -1]]\n" ] } ], "source": [ "g = create_election_graph(p)\n", "pprint.pprint(g, width=20)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "With the adjacency matrix available, we can implement the calculation of the strongest paths.\n", "\n", "The function `calc_strongest_paths(p, candidates)` will take as input the adjacency matrix and the candidates and will return:\n", " * `s`, a matrix of size $n \\times n$ such that `s[i][j]` is the strongest path between nodes `i` and `j`.\n", " * `pred`, a matrix of size $n \\times n$ such that `pred[i][j]` is the predecessor of node `i` in the strongest path to node `j`.\n", " \n", "The algorithm finds the strongest paths iteratively, by allowing to use one additional node as intermediate node in the paths in each iteration." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def calc_strongest_paths(p):\n", " n = len(p)\n", " # Initialize strongest paths array.\n", " s = [ [ -1 for j in range(n) ] for i in range(n) ]\n", " # Initialize predecessors array.\n", " pred = [ [ -1 for j in range(n) ] for i in range(n) ]\n", " \n", " # Initially the strength of the path s[i][j] is simply\n", " # the difference in the weights between p[i][j] \n", " # and p[j][i].\n", " for i in range(n):\n", " for j in range(n):\n", " if p[i][j] > p[j][i]:\n", " s[i][j] = p[i][j] - p[j][i]\n", " pred[i][j] = i\n", " \n", " # For each k, i, j, such that the path from i to j\n", " # can be strengthened by taking the detour from i to k\n", " # and k to j adjust the path and the predecessor.\n", " # This can happen at most n times.\n", " for k in range(n):\n", " for i in range(n):\n", " if i != k:\n", " for j in range(n):\n", " if j != i:\n", " if s[i][j] < min(s[i][k], s[k][j]):\n", " s[i][j] = min(s[i][k], s[k][j])\n", " pred[i][j] = pred[k][j]\n", " \n", " return (s, pred)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We now apply `calc_strongest_paths(p)` to our example:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "strongest paths\n", "[[-1, 5, 7, -1],\n", " [9, -1, 7, -1],\n", " [5, 5, -1, -1],\n", " [9, 13, 7, -1]]\n", "predecessors\n", "[[-1, 2, 0, -1],\n", " [1, -1, 0, -1],\n", " [1, 2, -1, -1],\n", " [1, 3, 0, -1]]\n" ] } ], "source": [ "s, pred = calc_strongest_paths(p)\n", "print('strongest paths')\n", "pprint.pprint(s, width=30)\n", "print('predecessors')\n", "pprint.pprint(pred, width=30)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The final step in the Schulze algorithm is finding, for each candidate the candidates that are less popular.\n", "\n", "That is a matter of comparing `s[i][j]` and `s[j][i]`.\n", "\n", "We implement the logic in `calc_results(s)`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def calc_results(s):\n", " n = len(s)\n", " wins = [ [] for i in range(n) ]\n", " for i in range(n):\n", " for j in range(n):\n", " if i != j:\n", " if s[i][j] > s[j][i]:\n", " wins[i].append(j)\n", " return wins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Finally, we can find the winner of the election:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[2], [0, 2], [], [0, 1, 2]]\n" ] } ], "source": [ "wins = calc_results(s)\n", "print(wins)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Candidate `A` wins over `C`.\n", "* Candidate `B` wins over `A`, `C`.\n", "* Candidate `D` wins over `A`, `B`, `C`.\n", "* Candidate `D` wins the election." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The Schulze Method: An Alternative\n", "\n", "We can implement the Schulze method with an alternative implementation, in which instead of an adjacency matrix we use a dictionary to represent the preferences.\n", "\n", "The logic is entirely the same." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We implement `calc_pairwise_prefs(ballots)` to return a dictionary `p` such that `p[(c_i, c_j)]` shows how many voters prefer candidate `c_i` to candidate `c_j`.\n", "\n", "The keys to the dictionary are the tuples `(c_i, c_j)`.\n", "\n", "Note that we do not need to work with indices instead of the actual voters.\n", "\n", "We use a `defaultdict(int)`, so the dictionary will return 0 if `(c_i, c_j)` is not a key.\n", "\n", "Essentially this is like initializing the preferences matrix to zero." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "defaultdict(