Traveling salesman problem: a naive solution
In the previous post, I used geopy to calculate the distance between cities. That remind me the traveling salesman problem: a mathematical challenge that seeks to find the shortest possible route for a salesman to visit a set of cities exactly once and return to the starting point. It is considered an NP-hard problem, meaning it is difficult to solve efficiently as the number of cities increases.
We start with a dictionary whose keys are the city names and its values are the geographic coordinates of each city.
city_coords = { "São Paulo": (-23.5505, -46.6333), "Rio de Janeiro": (-22.9068, -43.1729), "Brasília": (-15.8267, -47.9218), "Salvador": (-12.9777, -38.5016), "Fortaleza": (-3.7319, -38.5267), "Belo Horizonte": (-19.9167, -43.9345), "Manaus": (-3.1190, -60.0217), "Curitiba": (-25.4284, -49.2733), "Recife": (-8.0476, -34.8770), "Porto Alegre": (-30.0346, -51.2177), "Goiânia": (-16.6786, -49.2550), "Belém": (-1.4558, -48.5044), "Guarulhos": (-23.4548, -46.5333), "Campinas": (-22.9056, -47.0608), "São Luís": (-2.5307, -44.3068), "São Gonçalo": (-22.8222, -42.9625), "Maceió": (-9.6658, -35.7350), "Duque de Caxias": (-22.7856, -43.3115), "Nova Iguaçu": (-22.7595, -43.4528), "Natal": (-5.7950, -35.2094) }
Now, we use that information to calculate the distance between cities, and build a list whose elements are a triplet: (city_a, city_b, distance).
from geopy.distance import geodesic from itertools import combinations from random import sample, choice distances = list() cities = list(city_coords.keys()) for city_a, city_b in combinations(cities, 2): coord_a, coord_b = city_coords[city_a], city_coords[city_b] km = int(round(geodesic(coord_a, coord_b).kilometers)) distances.append((city_a, city_b, km))
We can transform the information in the list distances into a numpy ndarray:
import numpy as np dt = np.dtype([("city_start", "U15"), ("city_end", "U15"), ("distance", int)]) data_set = np.array(distances, dtype=dt)
If you're wondering "What does U15 stands for?", I leave you an extract from the numpy documentation
Array-protocol type strings
The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are:
'?' boolean 'b' (signed) byte 'B' unsigned byte 'i' (signed) integer 'u' unsigned integer 'f' floating-point 'c' complex-floating point 'm' timedelta 'M' datetime 'O' (Python) objects 'S', 'a' zero-terminated bytes (not recommended) 'U' Unicode string 'V' raw data (void)
Let's select a random city of our list to start the travel
def random_start_city(cities): """Returns the random city to start the travel""" return choice(cities)
We can also define a function to obtain the closest city
def closest_city(routes, ind=0): """Sort the list by distance and return shortest distance route""" route = sorted(routes, key=lambda dist: dist[2], reverse=True).pop(-1-ind) return route
Let's try the closest_city function:
dist_salvador = data_set[['Salvador' in data for data in data_set]] print(closest_city(dist_salvador))
('Salvador', 'Maceió', 475)
Now, the defining_path function is the one in charged of finding the sequence of tracks, starting from a given city (I'm assuming the city is in the list) or from a random city in the list.
def defining_path(city=None): itinerary = list() starting_city = random_start_city(cities) if city==None else city print("Starting City:", starting_city) visited_cities = [starting_city] for i in range(len(cities)-1): ind = 0 search = True while search: path = closest_city(data_set[[starting_city in data for data in data_set]], ind) city_a = path[0] if starting_city == path[0] else path[1] city_b = path[0] if starting_city != path[0] else path[1] if city_b in visited_cities: ind += 1 continue else: itinerary.append(path) visited_cities.append(city_b) starting_city = city_b search = not search return itinerary
Checking without a starting city:
paths = defining_path() for num, path in enumerate(paths): print(num+1, path)
Starting City: Fortaleza
1 ('Fortaleza', 'Natal', 433)
2 ('Recife', 'Natal', 252)
3 ('Recife', 'Maceió', 202)
4 ('Salvador', 'Maceió', 475)
5 ('Salvador', 'Belo Horizonte', 962)
6 ('Belo Horizonte', 'Nova Iguaçu', 319)
7 ('Duque de Caxias', 'Nova Iguaçu', 15)
8 ('Rio de Janeiro', 'Duque de Caxias', 20)
9 ('Rio de Janeiro', 'São Gonçalo', 24)
10 ('Guarulhos', 'São Gonçalo', 372)
11 ('São Paulo', 'Guarulhos', 15)
12 ('São Paulo', 'Campinas', 84)
13 ('Curitiba', 'Campinas', 359)
14 ('Curitiba', 'Porto Alegre', 545)
15 ('Porto Alegre', 'Goiânia', 1493)
16 ('Brasília', 'Goiânia', 171)
17 ('Brasília', 'São Luís', 1523)
18 ('Belém', 'São Luís', 482)
19 ('Manaus', 'Belém', 1294)
Checking with a static starting city:
paths = defining_path("Manaus") for num, path in enumerate(paths): print(num+1, path)
Starting City: Manaus
1 ('Manaus', 'Belém', 1294)
2 ('Belém', 'São Luís', 482)
3 ('Fortaleza', 'São Luís', 656)
4 ('Fortaleza', 'Natal', 433)
5 ('Recife', 'Natal', 252)
6 ('Recife', 'Maceió', 202)
7 ('Salvador', 'Maceió', 475)
8 ('Salvador', 'Belo Horizonte', 962)
9 ('Belo Horizonte', 'Nova Iguaçu', 319)
10 ('Duque de Caxias', 'Nova Iguaçu', 15)
11 ('Rio de Janeiro', 'Duque de Caxias', 20)
12 ('Rio de Janeiro', 'São Gonçalo', 24)
13 ('Guarulhos', 'São Gonçalo', 372)
14 ('São Paulo', 'Guarulhos', 15)
15 ('São Paulo', 'Campinas', 84)
16 ('Curitiba', 'Campinas', 359)
17 ('Curitiba', 'Porto Alegre', 545)
18 ('Porto Alegre', 'Goiânia', 1493)
19 ('Brasília', 'Goiânia', 171)
We can use the information in the list returned by the defining_path function to calculate the total traveled distance by the salesman:
def total_distance(itinerary): return sum(z for x, y, z in itinerary)
The variable paths contains the itinerary when we start from Manaus. We can compute the total distance (in Kilometers):
print(total_distance(paths))
8173
Concluding remarks
When we have a list of cities, we could build a set of ordered pairs determining the departing and arriving cities. My algorithm uses unordered pairs, so it considers half of the possible tracks.
However, in the search of a non-visited city the algorithm uses a loop through the whole list of cities, spending valuable time when the list of visiting places is large. I though in removing the tracks containing visited cities so that after each turn the set gets smaller. But still have to try it… and compare the memory usage.