Lena is hiring three new reps. She has to carve out three territories with roughly equal pipeline value.
accts = pd.read_csv("accounts.csv") # account_id, state, pipeline_value
state_value = accts.groupby("state")["pipeline_value"].sum().sort_values(ascending=False)
print(state_value)
n_reps = 3
territories = {f"Rep{i+1}": {"states": [], "value": 0} for i in range(n_reps)}
for state, value in state_value.items():
# assign to whichever rep currently has the lowest total
lowest = min(territories, key=lambda r: territories[r]["value"])
territories[lowest]["states"].append(state)
territories[lowest]["value"] += value
for rep, info in territories.items():
print(f"{rep}: ${info['value']:,.0f} ({len(info['states'])} states)")
for rep, info in territories.items():
sub = accts[accts["state"].isin(info["states"])].sort_values("pipeline_value", ascending=False)
sub.to_excel(f"territory_{rep}.xlsx", index=False)
values = [t["value"] for t in territories.values()]
print(f"min:{min(values):,.0f} max:{max(values):,.0f} spread:{(max(values)-min(values))/sum(values)*100:.1f}%")
scipy.optimize.linprog or the pulp library. Greedy is a great first cut.
Modify the script so each rep gets at most 12 states (forces re-balancing in dense areas).