HomeCourseModule 06 › Sets — unique values, fast

Sets — unique values, fast

Module 06 · Data Structures6 min readBeginner

What you'll learn

  • Create and modify sets
  • Use set operations (union, intersection, difference)
  • Use a set to deduplicate fast

A set is an unordered collection of unique values

colors = {"red", "green", "blue"}
colors.add("red")        # already there — ignored
colors.add("yellow")
print(colors)            # {'red','green','blue','yellow'}  (order varies)

Deduplicating a list

raw = ["alice@x.com", "bob@y.com", "alice@x.com", "carol@z.com"]
unique = list(set(raw))

Set operations

last_month = {"Alice", "Bob", "Carol"}
this_month = {"Bob", "Carol", "Dave"}

last_month | this_month   # union           {Alice, Bob, Carol, Dave}
last_month & this_month   # intersection    {Bob, Carol}
last_month - this_month   # difference      {Alice}      — churned
this_month - last_month   # difference      {Dave}       — new

That last pair — "who churned" and "who's new" — is a classic real-world use.

Fast membership

Checking x in big_list scans every item. Checking x in big_set is nearly instant, regardless of size. If you're checking membership a lot, use a set.

Key takeaways

  • Sets hold unique values; adding a duplicate is silently ignored.
  • list(set(x)) is the fastest dedupe trick.
  • | union, & intersection, - difference.
  • Membership checks on sets are O(1) — way faster than lists.

Churn analysis

Given last month's customers (a list) and this month's customers (a list), print the count of churned and new customers.

📹 Video walkthrough
A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.