Solving the Travelling Salesman Problem Using a Genetic Algorithm
An exploration with Python
You can view the notebook for this project here.
This article is also available to read on towardsdatascience.com
Travelling Salesman Problem
The Travelling Salesman Problem, TSP, describes a scenario where a salesman wishes to visit a number of cities, while taking the shortest possible route, before returning home to the start point. While it may appear simple, this problem not only has no known polynomial time solution, but there is also no time-efficient way to prove that a given solution is in fact optimal.
Instead, we often use heuristic algorithms to give an approximate solution that is sufficient for many practical uses. In this article, we will explore a different approach to generating a ‘good’ solution using a Genetic Algorithm. For a more in-depth discussion of the difficulties of the TSP, as well as a summary of some of the heuristic methods used to solve it, check out this article.
Genetic Algorithm
A Genetic Algorithm, GA, is a machine learning technique that is modelled on biological evolution. It is a guided random search algorithm that can explore a large solution space very efficiently. A GA involves building a population of ‘chromosomes’ which act as potential solutions to the problem we are attempting to solve. We then perform operations on these chromosomes to drive incremental improvement over many generations. The rest of this article assumes a basic understanding of the steps involved in a GA, to see a more detailed summary of the process, see this article.
A Quick Detour into Graph Theory
A mathematical graph is an object consisting of nodes (sometimes called vertices) which are connected by edges. The labels attached to the edges indicates the weight, or distance between nodes. Applied to the TSP, this means we will use nodes to represent cities we wish to visit, and the weights on the edges tell us the distance between the cities.
We can describe a graph using a distance matrix, where the values in the matrix represent the weights of the corresponding edges. For example, the value in coordinate (3, 4) represents the weight of the edge between vertex 3 and vertex 4. Notice that the matrix is symmetrical along the diagonal, so (3, 4) is the same as (4, 3). Intuitively, this means that the distance from A to B is the same as the distance from B to A. There are versions of the TSP where this criteria is relaxed, but they will not be discussed here.
# for a graph with 5 nodes, we require a 5x5 matrix
# each column and row corresopnds to one of the nodes
[ [ 0, 28, 6, 23, 22 ],
[ 28, 0, 12, 12, 16 ],
[ 6, 12, 0, 29, 4 ],
[ 23, 12, 29, 0, 15 ],
[ 22, 16, 4, 15, 0 ]]
# element in index (3, 4) = 15 = (4, 3)
# this represents the edge between node D (3) and E (4) in the graph above
Since there is no edge connecting a node to itself, we see that all of the diagonal entries ( [0, 0], [1, 1], etc… ) of this matrix are 0.
Modelling the TSP
As the number of nodes in a graph increases, the number of possible routes increases exponentially. For this example, we’ll use a graph with 10 nodes. This will give us 181,440 = (10-1)! / 2
possible routes to consider. This is definitely a suitably large search space to begin with.
If the search space is too small, the optimal solution will likely be found in the first couple of iterations of the algorithm, making this approach redundant. (Try running the code on a graph with 6 vertices to see an example of this)
To solve the TSP, we need to find a cycle (closed loop) that passes through every node exactly once. As a consequence, we cannot use an edge more than once per cycle. We need to find such a cycle that is of minimal weight (the sum of the weights of the edges involved) in order to produce an optimal solution to the problem.
To build our population, we need to know what format our solution should take. For a graph on 5 nodes, our solution could look something like A-B-C-D-E-A
, however, since we always start and finish at A, we do not need to include this in our encoding. This leaves us with a solution of B-C-D-E. Therefore, a chromosome in our population will be some permutation of these 4 nodes. We’ll use numbers (starting at 0) instead of the letter names of the nodes for programmatic simplicity. The number of elements in our population is one of the parameters we can alter in order to improve our results, so we’ll leave this as a variable (pop_size).
def generate_initial_population(num_nodes, pop_size):
nodes = [i for i in range(1, num_nodes)] # excluding start node (A)
population = []
for i in range(pop_size):
random.shuffle(nodes)
population.append(nodes.copy())
return population
This is a sample population (size 10) for a graph on 10 nodes:
[[3, 7, 4, 2, 9, 5, 6, 8, 1],
[5, 4, 9, 2, 3, 6, 8, 7, 1],
[8, 7, 5, 2, 3, 4, 1, 9, 6],
[9, 3, 2, 1, 5, 4, 7, 8, 6],
[9, 7, 5, 3, 4, 6, 1, 8, 2],
[1, 3, 6, 9, 7, 5, 4, 2, 8],
[6, 8, 3, 2, 7, 5, 9, 1, 4],
[9, 6, 5, 3, 2, 7, 8, 4, 1],
[5, 2, 7, 1, 8, 3, 6, 9, 4],
[1, 7, 3, 8, 5, 4, 9, 2, 6]]
Now we need to define a fitness function. This will be used to evaluate the success of our chromosomes. Since our aim is to minimise the weight of a cycle through every node, it makes sense for our fitness function to calculate this value. For example, the chromosome B-C-D-E would have a fitness score calculated by AB + BC + CD + DE + EA.
def fitness(chromosome, m):
# distance matrix, m
score = 0
for i, gene in enumerate(chromosome):
if i == 0:
score += m[0, gene] # edge between start node and first in chromosome
else:
score += m[gene, chromosome[i-1]] # edge between any other pair of nodes
score += m[0, chromosome[-1]] # edge connecting back to start node
return 100/score # take the reciprocol and scale
It also makes sense to scale, and take the reciprocal of the fitness scores. This is due to the fact that a lower numerical cost route is a better solution, but it is helpful if a better route is represented by a higher value.
The last step to complete our setup is to sort our population by their fitness score, putting the most fit individuals at the top. This is an optional step, but it will make our life easier. The following code brings together all of the setup so far.
node_labels = {0: "A", 1: "B", 2: "C", 3: "D", 4: "E", 5: "F", 6: "G", 7: "H", 8: "I", 9: "J"}
# generate the graph (distance matrix)
m = generate_graph(num_nodes=10)
# build our initial population of chromosomes
# population consisting of 20 random individuals
population = generate_initial_population(num_nodes=10, pop_size=20)
# evaluate the fitness of our population (using parallel array)
fitness_scores = [ fitness(i, m) for i in population]
# sort population by their fitness score
order = np.array(sorted([*enumerate(fitness_scores)], key=lambda x: x[1], reverse=True), dtype=int)[:, 0]
population = [population[i] for i in order]
fitness_scores = sorted(fitness_scores, reverse=True)
# Initial population after sorting:
Chromosome: [4, 1, 7, 5, 6, 8, 2, 9, 3] Fitness: 0.806
Chromosome: [4, 8, 9, 7, 3, 5, 1, 2, 6] Fitness: 0.8
Chromosome: [6, 7, 8, 9, 5, 3, 1, 2, 4] Fitness: 0.746
Chromosome: [8, 4, 2, 7, 6, 1, 5, 3, 9] Fitness: 0.735
Chromosome: [6, 9, 3, 8, 5, 7, 1, 2, 4] Fitness: 0.725
Chromosome: [1, 9, 3, 5, 6, 4, 2, 8, 7] Fitness: 0.704
Chromosome: [8, 4, 1, 5, 3, 2, 6, 7, 9] Fitness: 0.694
Chromosome: [1, 8, 4, 2, 6, 5, 9, 3, 7] Fitness: 0.671
Chromosome: [4, 5, 7, 8, 3, 1, 9, 2, 6] Fitness: 0.658
Chromosome: [7, 8, 5, 6, 9, 3, 4, 1, 2] Fitness: 0.654
Chromosome: [3, 7, 1, 8, 2, 9, 6, 4, 5] Fitness: 0.637
Chromosome: [8, 1, 6, 9, 4, 2, 3, 5, 7] Fitness: 0.629
Chromosome: [1, 4, 6, 3, 5, 7, 2, 8, 9] Fitness: 0.621
Chromosome: [8, 7, 2, 5, 6, 1, 9, 4, 3] Fitness: 0.61
Chromosome: [8, 7, 2, 9, 6, 3, 4, 1, 5] Fitness: 0.588
Chromosome: [2, 4, 5, 6, 8, 1, 3, 7, 9] Fitness: 0.575
Chromosome: [8, 3, 2, 5, 4, 7, 9, 1, 6] Fitness: 0.556
Chromosome: [3, 8, 7, 5, 9, 4, 2, 1, 6] Fitness: 0.552
Chromosome: [9, 7, 6, 1, 3, 8, 5, 4, 2] Fitness: 0.535
Chromosome: [3, 8, 9, 5, 4, 6, 1, 7, 2] Fitness: 0.498
So far, we have built our graph, generated our initial population, and written a function to evaluate the fitness of our chromosomes. Now we just need to improve our solutions.
Selection
There are several repeated steps involved in the GA. The first of which we will address is selection. There are various different selection algorithms we can choose from, but we want one that will select more fit individuals more often. For this we use what’s known as ‘Roulette Wheel Selection’ which is a weighted random selection based on an individual’s fitness score. We’ll choose a sample size of 2, to select 2 parent chromosomes.
# roulette wheel selection
# weighted random selection based on fitness score
def roulette_wheel_selection(pop_size, fitness_scores):
return(random.choices(range(pop_size), weights=fitness_scores, k=2))
Crossover
The next step is to create our crossover function, which acts as the ‘reproduction’ stage. The idea here is that given our two parent chromosomes, we perform some operation to produce 2 offspring that are a mix of both parent’s genes.
Since our chromosomes represent a permutation, or order of visiting nodes, we can’t just take half of each parent, as the child chromosomes may not contain the right nodes, or may have duplicates, and therefore be invalid solutions. Instead, we will implement what’s known as order based crossover.
We start by randomly selecting 3 nodes. We then identify the order in which they appear in each chromosome. We rearrange those values, in each chromosome, to match the order in which they appear in the other parent. This gives us two different offspring.
randomly selected values: [1, 5, 9]
Parent 1: [4, *1*, 7, *5*, 6, 8, 2, *9*, 3]
Parent 2: [4, 8, *9*, 7, 3, *5*, *1*, 2, 6]
order in which they appear in parent 1: [1, 5, 9]
order in which they appear in parent 2: [9, 5, 1]
rearrange the values in parent 1 to match the order in parent 2:
child 1: [4, *9*, 7, *5*, 6, 8, 2, *1*, 3]
do the same with parent 2 to match parent 1:
child 2: [4, 8, *1*, 7, 3, *5*, *9*, 2, 6]
The two children are now combinations of their parent’s genes.
There are many different ways that chromosomes can be crossed over, largely depending on how the chromosomes are encoded. The goal of crossover is to combine parts of two good solutions to hopefully generate an even better one, so any method that mixes them in a valid way is worth investigating. There’s no reason we have to stick with 3 values as we have with this example either, we could do the same with 4 or 5. This is another variable you could alter to tweak the performance of your algorithm. Perhaps choosing 4 would give you better mixing, and therefore better results, but you’d need to test to find out.
Mutation
The last main step of the algorithm is the mutation step. It is important to introduce some random mutations to our chromosomes to allow the algorithm to explore more possibilities in our search space. It can also prevent the gene pool from becoming too small causing the algorithm to converging to a non-optimal solution.
We introduce two different mutation algorithms here so that we can try both to see which works best. The first is a swapping of two nodes within a chromosome:
# swap mutation - swap 2 values in a chromosome
def mutate1(x):
# random choice
index = np.random.choice(range(len(x)), size=2, replace=False)
# swap values
temp = x[index[0]]
x[index[0]] = x[index[1]]
x[index[1]] = temp
return x
The second mutation algorithm we will try is a random re-insertion of a value. We randomly select and remove a value, then insert it back in a random position.
# random insertion - remove a value and re-insert it in a random
def mutate2(x):
# random choice of value
index = random.choice(range(len(x)))
# remove value
temp = x[index]
x.remove(temp)
# randomly re-insert value
x.insert(random.choice(range(len(x))), temp)
return x
Structure of the Genetic Algorithm and its Parameters
Let’s first take a look at the pseudocode for a GA to get an idea of what’s going on. We’ll then implement it in Python.
# generate initial population and evaluate its fitness
# sort population by fitness
# REPEAT (for number of generations/iterations or until stop criteria met)
# REPEAT: build new population
# select 2 parents
# crossover of parents to produce offspring
# mutate offspring
# add them to new population
# evaluate new population using fitness function
# sort new population by fitness scores
This is a fairly simple procedure, but there are a few more parameters we need to consider. These are factors that need to be considered in order to maximise performance of the algorithm.
- Crossover Threshold: crossover can help produce new better solutions, but it can also break good existing chromosomes. For the first few generations we may want a lot of crossover to quickly find better solutions, later on however, we may want less crossover destruction as we fine-tune our solutions. The crossover threshold is a trade off to help us balance these two goals. When we reach the crossover step, we generate a random value. If the value is below the threshold, we crossover, otherwise the offspring are identical copies of the parents and no crossover occurs.
- Mutation Threshold: the mutation threshold works similarly to the crossover threshold, except it is for the mutation step. Mutation allows us to explore the area around our solutions. If we mutate every single chromosome, we may modify them too much and miss some good solutions. If we don’t mutate at all, our population will quickly converge to a sub-optimal solution (due to the weighted selection step).
- Elitism: Sometimes, by chance, we may happen to find a good solution in an early generation. Elitism allows us to copy the best chromosome from one generation to the next, allowing good solutions to be preserved. The elitism parameter determines how many of the best solutions are copied to the next generation. This value is usually set to 1. Increasing this parameter is risky as it can lead to a suboptimal solution, however, a higher value will result in faster convergence.
- Population Size: Having a larger population size allows the algorithm to explore more possibilities at each generation. This can allow you to find a better overall solution, and not converge on a worse one.
- Iterations: the number of iterations is the number of times the algorithm will complete its cycle. More iterations will likely produce a better result, but will take longer to compute. The same is true for population size, therefore there is a trade-off when setting these two parameters.
Python Implementation of the Algorithm
We also introduce a mean_fitness_scores`
and a best_fitness_scores`
array in order to track the progress of our algorithm at each generation. This will allow us to visualise the progress of the algorithm, and help us to optimise the parameters later on.
# main genetic algorithm function
# we pass our selection, crossover and mutation functions as parameters so they
# can be swapped in and out
# we also pass the number of iterations, elitism constant, and our thresholds
def genetic_algorithm(population, iterations, selection, crossover, crossover_threshold, mutation, mutation_threshold, elitism):
pop_size = len(population)
mean_fitness_scores = []
best_fitness_scores = []
### EVALUATE INITIAL POPULATION ###
fitness_scores = [ fitness(i, m) for i in population]
### SORT INITIAL POPULATION ###
order = np.array(sorted([*enumerate(fitness_scores)], key=lambda x: x[1], reverse=True), dtype=int)[:, 0]
population = [population[i] for i in order]
fitness_scores = sorted(fitness_scores, reverse=True)
### GENERATION LOOP ###
for iter in range(iterations):
new_pop = copy.deepcopy(population)
replaced = elitism
# build new population
while(replaced < pop_size):
# selection
index = selection(pop_size, fitness_scores)
# crossover
if random.random() < crossover_threshold:
child1, child2 = crossover(population[index[0]].copy(), population[index[1]].copy())
else:
child1, child2 = population[index[0]].copy(), population[index[1]].copy()
# mutation
if random.random() < mutation_threshold:
child1 = mutation(child1)
if random.random() < mutation_threshold:
child2 = mutation(child2)
# replacement
new_pop[replaced] = child1
replaced += 1
if replaced < pop_size:
new_pop[replaced] = child2
replaced += 1
# evaluation
population = new_pop
fitness_scores = [fitness(i, m) for i in population]
# sort population by fitness
order = np.array(sorted([*enumerate(fitness_scores)], key=lambda x: x[1], reverse=True), dtype=int)[:, 0]
population = [population[i] for i in order]
fitness_scores = sorted(fitness_scores, reverse=True)
# calculate mean and maximum fitness score of this generation
mean_fitness_scores.append(np.mean(fitness_scores))
best_fitness_scores.append(fitness_scores[0])
# we return the mean and best fitness scores, as well as the best chromosome found
return(mean_fitness_scores, best_fitness_scores, population[0].copy())
Testing
The parameters shown have already been altered to give us a consistent solution in a reasonable time. The algorithm runs in a couple of seconds, and produces results consistently better than the Nearest Neighbour method.
### PARAMETERS ###
iterations = 250
pop_size = 300
num_nodes = 10
### GENERATE GRAPH ###
m = generate_graph(num_nodes)
draw_graph(node_labels, m)
### GENERATE INITIAL POPULATION ###
population = generate_initial_population(num_nodes=num_nodes, pop_size=pop_size)
### RUN GENETIC ALGORITHM ###
mean_fitness, best_fitness, tour = genetic_algorithm(
population=population,
iterations=iterations,
selection=roulette_wheel_selection,
crossover=order_crossover,
crossover_threshold=0.7,
mutation=mutate1,
mutation_threshold=0.3,
elitism=1
)
Results:
This view allows us to see improvements over time, and how the algorithm converges
Best Tour: ['A', 'G', 'C', 'E', 'I', 'F', 'J', 'D', 'H', 'B', 'A']
Weight: 73
Nearest Neighbour: ['A', 'E', 'C', 'H', 'B', 'I', 'J', 'D', 'G', 'F', 'A']
Weight: 129
As you can see, the population sees a large improvement over time with the chromosomes getting consistently better on average. We see rises and falls in the average as different possibilities are explored, but an overall positive trend in the long run.
Our best fitness value stabilises quite quickly. This is, in part, due to the size of the solution space being too small such that it doesn’t take many repetitions to find the best solution. Testing this again with 12 nodes shows more continual improvement over time (see below)
It’s difficult to predict when the algorithm will find a best value due to the randomness of the generation. It’s also difficult to say whether more iterations would have found a solution that is better than the one above, or just have wasted more processing time. From my testing, it seemed that after around 200 repetitions, there were rarely any improvements found. But depending on the context in which this was being used, and the number of nodes, you may want to run it longer to produce a better solution.
For example, I ran another test with iterations=1500
and population size=1000
on a graph with 12 nodes. It took 2 mins to run, but the results were remarkably similar. The mean scores seemed to stabilise after a few hundred generations and saw no major improvement beyond that point. A new best score was found at around iteration 1200, but it was only a very slight improvement. This really highlights the accuracy-time trade off present with a system such as this one, where the size and frequency of increases diminishes over time.
Best Tour: ['A', 'C', 'B', 'D', 'H', 'J', 'K', 'E', 'F', 'G', 'I', 'L', 'A']
Weight: 97.0
Nearest Neighbour: ['A', 'K', 'E', 'F', 'I', 'L', 'G', 'B', 'D', 'H', 'J', 'C', 'A']
Weight: 108
We can also investigate how tweaking just one parameter, in this case mutation threshold, can deliver quite different results:
[LEFT] mutation threshold = 0 [RIGHT] mutation threshold = 0.25
The graph on the left shows the population quickly converge (by generation 70) and then little change is shown after that point. By contrast, the graph on the right, where mutation is present, shows much more variation in the population, allowing more possibilities to be explored.
If you want to explore how each of the factors influence the results of the algorithm, check out the notebook for this project, here.
A GA offers a fascinating and powerful approach to tackling the Travelling Salesman Problem and many other similar problems with a large searchable solution space. While this implementation is likely not the most efficient, it is still capable of producing a sufficient solution in a reasonable time frame. Its results are comparable to, if not superior to, other simple heuristic approaches such as the Nearest Neighbour Algorithm.
As we’ve seen, the parameters of the algorithm are all interdependent, and highlight a time-accuracy trade off that underpins many decision and optimisation problems. I believe studying the interaction of these parameters and the way in they affect the results, is a valuable learning experience, and can provide a greater appreciation of the interplay of factors in many different scenarios.
While not commonly used, the Genetic Algorithm exemplifies how we can use simple repeatable methods to achieve accurate results to large, complex problems that may not have an analytical solution.
#Python #Travelling Salesman #Genetic Algorithm #Machine Learning #AI