Python Generator Expressions

In Python, you can do a list comprehension (short list builder) which is really efficient:

[ val for val in biglist ]

You can do the same thing with dictionaries:

{ x.val: x.val2 for x in biglist }

The Feature that Changed My Life

You can take that inner statement “val for val in biglist” and package it up into a reusable generator expression!

mygenerator = ( val for val in someiterable )

Then, you can pass that generator around to other functions. Each call will run the generator once, return a value, and then wait until it is told to iterate again. The crazy part is, I knew that this is how the yield statement worked in Python to create generator functions (functions using yield return a generator). I had never thought of this flip side of generators functions packaged up all nice and neatly.

Implications

This became very useful when I was working on a script that could generate dummy data for multiple-choice submissions. I had a function to determine whether a dummy user should get a question right or wrong, and then needed to pull the appropriate answer into the submission. Here I am getting the first, incorrect option from the options list:

[ opt.guid for opt in question.options if not opt.correct ][0]

I looked at that function for a little and thought, “Self, is it efficient to populate a list of incorrect options when I only need one?” After some googling, I happened upon a helpful stackoverflow.com article talking about generator functions. I then changed my code to this:

next(opt.guid for opt in self.question.options if not opt.correct)

The builtin next() function is what python uses internally on for loops. When you call next() on an iterator or generator, it will return the next value or raise a StopIteration exception. In the code above, we call next() on our generator statement one time which causes the generator to run until it returns an option that is not correct. This is important! The generator will keep running until it returns (i.e. yields) something or exhausts its iterator. If we were to call next() on this generator a second time, it would start at the position it left off and run until it returns/yields the next incorrect option or until it reached the end of the list.

This feature also helped me go back and refactor some legacy code like the following:

sum([ opt.points for opt in options if opt.guid in user_guids ])  # Old way; Make an intermediate list to throw away
sum(opt.points for opt in options if opt.guid in guids)           # New way; No intermediate list

References

Advertisements
Python Generator Expressions