🐍 Python Once a Week
Comprehensions2019-11-12

Comprehensions are a great short-cut syntax to operate over elements on a sequence (a list is a sequence, but a dict also has sequence-like properties). It's one of python's syntactic tools that get the "programming" out of the way and the ideas down faster. Python has comprehensions over:

  • Lists
  • Sets
  • Dictionaries
  • Generators Will be covered in a future weekly tip!

Let's start with the first of these and learn by example! Buckle your seat belts 😼.

List Comprehensions

Say we had a list of emails and we just wanted to extract the username part of the email user@domain.com.

emails = [
    'fake.mary@business.net',
    'worldy@asdf.com', 
    'atotalyrealuser@example.com',
    'okay@example.com',
]

users = []
for email in emails:
   user = email.split('@')[0]
   users.append(user)

Note the .split method which splits a string based on the character sequence we provide.

'user@domain.com'.split('@') will return ['user', 'domain.com'] .

We could instead use a list comprehension

users = [email.split('@')[0] for email in emails]

The specific syntax here is [element for element in sequence]. We unpack element from sequence using for element in sequence and then can modify element.

That's pretty cool, it removes a few lines. If our logic to modify element is simple and easily written (or captured in a function) then a list comprehensions is a nice short-hand to replace a for loop.

But it can do a lot more. Let's say I only want the domains the end in .com . Then, I can do

users = [user.split('@')[0] for user in users if user.endswith('.com')]

Comprehensions have an additional if clause that let us filter on the element we're unwrapping from sequence. This can be any number of conditionals we like.

Bonus Double For Loop List Comprehension

Say we instead had a list of emails and we wanted to give each user a couple new domain options. So that if we wanted to provide superexample.com and superlegitsite.io to user@domain.com they would get user@superexample.com and user@superlegitsite.io. With a double for loop list comprehension we could do it like so

emails = [
    'fake.mary@business.net',
    'worldy@asdf.com', 
    'atotalyrealuser@example.com',
    'okay@example.com',
]

new_domains = ['superexample.com', 'superlegitsite.io', 'stroopwafel.food']

new_emails = [
	f'{user.strip('@')[0]}@{domain}'
	for domain in domains for email in emails
]

This is also a great way to turn a list of lists into a single list of elements.

double_values = [[1, 2], [3, 4, 5, 6, 7, 8], [9, 10, 11], [12]]
values = [inner for outer in double_values for inner in outer]

The important thing to know here is the sequence unwrapping happens from left-to-right.

[ inner for for inner in outer for outer in double_values] is invalid. Writing outer in double_values makes the variable outer available to the comprehension scope. We could even use outer if we wanted to do, to do something weird like.

values = [len(outer) * inner for outer in double_values for inner in outer]

A word of warning. I would use double for loop comprehensions sparingly. I use them here for your edification, but the syntax can quickly become very obtuse to read. Sometimes visual clarity is more important.

Set Comprehensions

Very similar in syntax to list comprehensions. Let's say for this example, we want to get each domain from our input list emails. If we did

emails = [
    'fake.mary@business.net',
    'worldy@asdf.com', 
    'atotalyrealuser@example.com',
    'okay@example.com',
]

domains = [email.split('@')[-1] for email in emails]

I could have used 1 instead of -1 above. Instead of saying the second element in the split, I preferred to say the last element in the sequence.

If I did the above domains would contain [example.com](http://example.com) twice. This isn't exactly what I want and could mess up some more logic downstream that expects the elements in domains to be unique.

A set comprehension is a perfect usage for performing de-duplication!

domains = {email.split('@')[-1] for email in emails}

All I changed was replacing the square brackets [ with curly brackets { . Simple but powerful when it comes to just getting code down. Note, that we could add an if clause to the set comprehension.

Dict Comprehensions

Let's say I want a dictionary that maps each email to a new domain. Something that looks like

emails = [
    'fake.mary@business.net',
    'worldy@asdf.com', 
    'atotalyrealuser@example.com',
    'okay@example.com',
]
new_domain = 'stroopwafel.io'
# parse emails into data here ...
data == {
   'fake.mary@business.net': 'fake.mary@stroopwafel.io',
   'worldy@asdf.com': 'worldly@stroopwafel.io',
   'atotalyrealuser@example.com': 'atotalyrealuser@stroopwafel.io',
   'okay@example.com': 'okay@stroopwafel.io'
}

I could write this with the dictionary comprehension

data = {email: f'{email.split("@")[0]}@{new_domain}' for email in emails}

Dict comprehensions also have a convenient way to reverse the mapping

reversed_data = {value: key for (key, value) in data.items()}

Conclusion

Once you get the hang of using comprehensions in your day-to-day workflow you'll start to appreciate their simplicity. Let me leave you with some resources I found along the way in writing this article: