ПИТОН ОБЪЕКТЫ: ИЗМЕНЧИВЫЙ VS. НЕИЗМЕННЫЙ

Note: this is a Russian language translation of the following English post: Python Objects: Mutable vs. Immutable. Thank you to my friends Andrei Rybin and Dimitri Kozlenko for their help in translating!

Не все объекты в Питон обрабатывают изменения одинаково. Некоторые объекты являются изменяемыми, то есть они могут быть изменены. Другие неизменны; они не могут быть изменены, а возвращают новые объекты при попытке обновить. Что это значит при написании кода в Питон?

Этот пост будет об (а) мутабильность общих типов данных и (б) случаях, когда вопрос переменчивости важен.

Изменчивость по типовым разновидностям

Ниже приведены некоторые неизменяемые объекты:

  • int
  • float
  • decimal
  • complex
  • bool
  • string
  • tuple
  • range
  • frozenset
  • bytes

Ниже приведены некоторые изменяемые объекты:

  • list
  • dict
  • set
  • bytearray
  • определяемые пользователем классы (если это специально не сделано неизменяемым)

То, что помогает мне помнить, какие типы изменчивы а какие нет, это что контейнеры и определяемые пользователем типы, как правило, изменяемые, в то время, как скалярные типы почти всегда неизменны. Потом я вспоминаю некоторые заметные исключения: tuple является неизменяемым контейнером, frozenset неизменяемая версия set. Строки неизменны; что если вы хотите, чтобы можно было изменять chars в определенном индексе? Используйте bytearray.

КОГДА мутабильность ВАЖНА

Изменчивость может показаться безобидной темой, но при написании эффективной программы ее необходимо понимать. Например, следующий код является простым решением для складывания строк вместе:

string_build = ""
for data in container:
    string_build += str(data)

На самом деле, это очень неэффективно. Поскольку строки являются неизменяемыми, складывание двух строк вместе фактически создает третью строку, которая является комбинацией двух предыдущих. Если вы перебираете много и строите большую строку, вы будете тратить много памяти на создание и удаление объектов. Кроме того, в конце итерации вы будете выделять и выбрасывать очень большие объекты, что является еще более дорогостоящим.

Ниже приводится более эффективный код в стиле Питона::

builder_list = []
for data in container:
    builder_list.append(str(data))
"".join(builder_list)

### Another way is to use a list comprehension
"".join([str(data) for data in container])

### or use the map function
"".join(map(str, container))

Этот код использует преимущества изменяемости одного объекта LIST, чтобы собрать свои данные вместе, а затем выделить результат в одну строку.. Это сокращает общее число выделяемых объектов почти вдвое.

Еще один подводный камень, связанные с изменчивостью является следующий сценарий:

def my_function(param=[]):
    param.append("thing")
    return param

my_function() # ["thing"]
my_function() # ["thing", "thing"]

То, что вы могли бы подумать, что произойдет в том, что, давая пустой LIST в качестве значения по умолчанию из параметров, это то, что новый пустой LIST будет выделятся каждый раз, когда функция вызывается и ни один LIST не передается. Но что на самом деле происходит, что каждый вызов, который использует LIST по умолчанию будет использовать один и тот же LIST. Это происходит потому, что Python (а) оценивает функции и сигнатуры только один раз (б) оценивает аргументы по умолчанию как часть определения функции, и (с) выделяет один изменяемый LIST для каждого вызова этой функции.

Не ставьте изменяемый объект в качестве значения по умолчанию для параметра функции. Неизменные типы совершенно безопасны. Если вы хотите получить желаемый эффект, сделайте это:

def my_function2(param=None):
    if param is None:
        param = []
    param.append("thing")
    return param

ВЫВОД

Изменчивость важна. Знайте ее Выучите ее. Примитивные типы, скорее всего, неизменны. Контейнерные типы, скорее всего, изменчивые.

References

ПИТОН ОБЪЕКТЫ: ИЗМЕНЧИВЫЙ VS. НЕИЗМЕННЫЙ

Implementing a Mini-React-Redux Framework on a Django Page

Introduction

I have built several production web applications using React and Redux and generally have had an excellent experience with those technologies.  One of React’s greatest assets IMO is it’s ability to integrate into all kinds of stacks and setups but still play nice with the other kids.  That was something that impressed me back in Spring 2014 when I first used React.  We got React running in the jQuery spaghetti code of a massive, legacy Ruby on Rails application with incredibly little effort and huge productivity benefits to the team.  Redux is also incredible for the amount of good it does you with so little code.

There are lot’s of blogs and tutorials on how to build a full single-page application (SPA) complete with client-side routing, persistent state, and even server-side rendering to boost that time-to-interactivity metric.  What if I don’t need that?  What if I already have a site built using and “old-school” server-side framework like Ruby on Rails or Django, but I have one specific page that should be highly interactive and need something more robust than simple jQuery?  React and Redux could still be hugely beneficial, but how do I do it without (a) getting bogged down in boilerplate or (b) over-engineering the solution?

Mini React-Redux Framework to the rescue!

Ready, Set, Go!

Let’s make the skeleton of a super, tiny JavaScript framework that can fit our use case for a Django website.

Here are the steps we’ll follow:

  1. Setup Webpack with Django.
  2. Install our client dependencies
  3. Implement the Mini React-Redux Framework

Setup Webpack with Django

For this step, we are going to use the django-webpack-loader tool to give us the power to load Webpack bundles onto a templated page.  The setup is very simple if you have a vanilla Django application; just follow the loader tutorial.  If you are using Django-Mako-Plus add-on, supplement the regular loader tutorial with my own little tutorial.

Install our client dependencies

The following are the NPM dependencies I am relying on:

{
  "dependencies": {
    "babel-core": "~6.3.26",
    "babel-loader": "~6.2.0",
    "babel-preset-es2015": "~6.3.13",
    "babel-preset-react": "~6.16.0",
    "react": "~15.4.2",
    "react-dom": "~15.4.2",
    "redux": "~3.6.0",
    "redux-logger": "~2.7.4",
    "redux-thunk": "~2.2.0",
    "webpack": "~1.13.2",
    "webpack-bundle-tracker": "0.0.93"
  }
}

Include these dependencies in your package.json and run npm install.

Implement the Mini React-Redux Framework

Here is the source I came up with for the mini framework:

import React from 'react'
import ReactDOM from 'react-dom'
import { createStore, applyMiddleware, compose } from 'redux'
import thunk from 'redux-thunk'
import createLogger from 'redux-logger'
import MyComponent from './components/MyComponent'

/**
 * Redux Reducer.
 * @params:
 *  - state: the previous state of the store
 *  - action: an object describing how the state should change
 * @returns:
 *  - state: a new state after apply appropriate changes
 */
const rootReducer = (state = { clicks: 0 }, action) => {
  // ... change state based on action
  return state
}

/**
 * Redux Store object with three functions you should care about:
 *  - getState(): returns the current state of the store
 *  - dispatch(action): calls the reducer with a given action
 *  - subscribe(): called after a reducer runs
 *
 * The store has two optional middlewares to showcase how you would add them:
 *  - redux-thunk: allows `store.dispatch()` to receive a thunk (function) or an object
 *                 See http://stackoverflow.com/questions/35411423/how-to-dispatch-a-redux-action-with-a-timeout/35415559#35415559
 *  - redux-logger: logs out redux store changes to the console. Only in dev.
 */
const middlewares = process.env.NODE_ENV === 'production'
    ? applyMiddleware(thunk)
    : applyMiddleware(thunk, createLogger())
let store = compose(middlewares)(createStore)(rootReducer)

/**
 * Helper function to render the Gradebook component to the DOM.
 * Makes the following props available to the Gradebook component:
 *  - storeState: an object of the latest state of the redux store.
 *  - dispatch: a function that dispatches actions to the store/reducer.
 */
const render = (nodeId, component) => {
  let node = document.getElementById(nodeId)
  ReactDOM.render(<component storeState={store.getState()} dispatch={store.dispatch} />, node)
}

/**
 * Function that bootstraps the app.
 *  - render the component with initial store state.
 *  - re-render the component when the store changes.
 */
const start = () => {
  render('app', MyComponent)
  store.subscribe(() => render('app', MyComponent))
}

To start the application just call the start function when the page loads. Here’s an example using jQuery:

$(() => start())

Explanation

This little proof-of-concept is interesting to me because of how much usefulness it provides with so little code.  With this code, we create a Redux store with some basic middlewares and a reduce that does nothing interesting (yet).  Then we render a component to the DOM, giving it the current store state and a function for the component to dispatch actions if necessary, and setting up a store subscription so that the component will be re-rendered whenever the store changes.

Another cool part about this approach is that a lot of the setup code can be pulled out and made reusable.  The render(), start(), and store setup would probably be the same for every Mini App we would create.  Then we could simplify this whole file down to just the reducer and passing in the node and component to the start function (not implemented here).

Conclusion

With very little effort and Boilerplate, we have a React application using Redux as it’s storage system.  With this in place, you can build quite sophisticated widgets and still have the flexibility to get more complex if you need to do something more involved.

Implementing a Mini-React-Redux Framework on a Django Page

Adding Webpack Bundles to your Django-Mako-Plus (DMP) Site

This post describes how to hook up Webpack to a Django site using the django-webpack-loader tool in the special case where your Django site is running the Django-Mako-Plus (DMP) library.

Why Webpack?

In the last few years, the ecosystem of JavaScript build tools has grown in both size and quality.  One of my favorite build tools is Webpack.  If you have not heard of it, I highly recommend it to you for bundling your JavaScript, CSS, and other static assets.  To get the most out of this post, please go do a little cursory research on the use case of the webpack bundler before continuing on here.

I also appreciate the Django framework for building dynamic web applications in Python. If you would like to use Django with Webpack, it takes a little extra work to get things hooked together in a clean, scalable way.  Webpack outputs “bundles” that can be formatted in many ways (Common.js, UMD, Require.js, etc.) depending on how they should be consumed and can even output the bundles with the md5 hash in the name to improve the caching of your bundles on the internet.

What is “django-webpack-loader”?

Django, for all its great features, handles static files poorly by modern standards which is where the django-webpack-loader (hereafter referred to as “the loader”) tool comes in.  It provides a way to load a webpack bundle by name into a Django template by mapping the webpack bundle’s “logical name” (e.g. main) to it’s filename (e.g. main-be0da5014701b07168fd.js) which filename changes whenever the contents of the bundle change.  To learn how the loader works, read the documentation and tutorial.

DMP with The Loader

The loader integrates with the templating system of Django.  If you are using Django-Mako-Plus (DMP), you replaced the default templating engine with Mako so the prepared require_bundle helper is not available anymore.  Lucky for us, Mako is so powerful that we can import python functions with ease.  All we need to do in a template is import the right function and call it using Mako syntax:

<%! from webpack_loader.templatetags.webpack_loader import render_bundle %>
<html>
  <head> 
    ${ render_bundle('main') }
  ...

Simple! We can even simplify this a bit by adding the import statement as a DEFAULT_TEMPLATE_IMPORTS for our Mako templates like so:

TEMPLATES = [
  {
    'BACKEND': 'django_mako_plus.MakoTemplates',
    'OPTIONS': {
      # Import these names into every template by default
      # so you don't have to import them explicitly
      'DEFAULT_TEMPLATE_IMPORTS': [
        'from webpack_loader.templatetags.webpack_loader import render_bundle',
      ]
    }
  }
]

BAM!

Conclusion

All done!  You are now ready to start using the django-webpack-loader to include Webpack bundles in your Django-Mako-Plus website!

 

Adding Webpack Bundles to your Django-Mako-Plus (DMP) Site

Defending JavaScript

In a forum thread, a member brought up the following critiques of JavaScript.  I quickly recognized many of these arguments as ones I have heard before and really wanted to address. In my attempt to not hijack the thread (which was not about “To JavaScript, or Not To JavaScript”), I collected my thoughts here.  This is not meant to be a passive-aggressive post, but rather as an aboveboard rebuttal in a logical discussion.

The Argument Against JS:

…(skipped for brevity)…

I think Javascript has gone a long ways since [I] first started using it, but here are some issues that I have had with it over the time I have used it:

  1. Javascript has a lot of “cute” and “neat” tricks in it, and I feel like people abuse those tricks constantly. In Python there are some cute tricks that you can use, but the community tends to frown upon it.
  2. NPM is such a good and bad experience. Compared to some other package systems its kind of messy. The other issue is that they have this
    “microservice” where instead of writing a one line piece of code they instead pull from NPM to get the same thing done. There was an issue a few months ago where one developer removed his package that thousands of people depended on, and it caused a “dependency hell” per se.
  3. Documentation is lousy on almost all javascript projects. The documentation tools for javascript projects are pretty lousy. When you have worked with docutils/sphinx for python you start to wonder what is wrong with the javascript documentation process.
  4. Lack of stability – this is getting better, but still pretty lousy at times. Almost all javascript projects including node has this issue. Everybody is so into “progressing” the platform that they push
  5. Too many kludges to make it imitate OO
  6. Poor unittesting tools.

My Defense of JavaScript

The arguments above use Python as a language reference point. That’s great! I have used Python for years and love it as well; so I will focus on comparing JS with Python.

1. (Terrible) Language Tricks

Every language has “cute and neat tricks” (read terribleness). JavaScript has more than some languages, but also less than other languages. Many of the “terrible things” in JS are a result of how it runs in the browser (DOM, globals, namespacing) rather than language problems (although JS itself has some bad ones)

The BIG difference with JS you are forgetting is that almost every language can break compatibility relatively freely (e.g. Python 3, Ruby 2, Lua [every version]). JavaScript can’t, because it would break the internet. Bad, wrong-headed decisions can not be removed once people start using them. Websites need to stop using features before they can be removed. Very few other languages have these strict deprecation requirements.

Python has it’s share of weirdness that people use and had a rather large set of breaking changes with Python 3.  Let’s read from the official sources about Python 3:

There are more changes than in a typical release, and more that are important for all Python users. Nevertheless, after digesting the changes, you’ll find that Python really hasn’t changed all that much – by and large, we’re mostly fixing well-known annoyances and warts, and removing a lot of old cruft.

Python had (still has) a big problem with Python 2.7 -> Python 3 upgrades.  Even now, years later, many projects still rely on 2.7 and haven’t upgraded. That situation would never work on the world wide web!

2. NPM

Your argument is against how people have used NPM, not against NPM itself. And the issues you cited are definitely problems, but they are problems for every package manager that reaches the nexus of popularity and ease-of-use. RubyGems had the exact same problem with excessive “micro-gems” 10 years ago.

And that NPM “left-pad” issue that broke everything earlier this year … yeah that could still happen on NPM, PyPi, RubyGems, INSERT_PACKAGE_MANAGER.  Nothing special/bad about NPM made it happen.  Just that a guy decided to be a jerk and remove a package that everyone depended on.

Compare PyPi to RubyGems and NPM and it makes sense why it was such a big deal for NPM: PyPi is notoriously fractured and wierd to publish on (hence smaller); RubyGems and NPM are notoriously easy (hence bigger). The problem of package management is a classic hard problem that nobody has figured out completely from JavaScript, Python, and Ruby to widely used Linux distributions.

3. Docs

JavaScript sucks because a lot of projects don’t write good docs? Not sure I follow the argument. But if you wanted to argue it, you could easily attribute that to the massive number of JavaScript projects vs. Python projects. JavaScript actually has many excellent automated tools for documention.

4. Instability

Patently false. Microsoft, Google, Facebook, Walmart, Mozilla, etc. have poured so much time, effort, and money into the JS ecosystem (specifically Node.js, NPM, and JS Engine implementations) that it has become one of the most stable platforms you can be on. And don’t forget the JS language guarantee that language features can’t be removed until most websites stop using them. Even among browsers, the consistency of good implementation of JS features is at an all time high.

Any instability in Node.js specifically has been largely mitigated with the new release process (Stable and Current distributions). The only “instability” to speak of is the massive volume of updates that V8 goes through to keep up with ECMAScript features, and those only matter to the native library maintainers not using node-gyp (which most use afaik). And even then, Google and Microsoft now work closely with Node.js maintainers to help with API changes in their JS engines.

5. Not OOP

Everything in JS is an object. It just doesn’t use Classical OO. Your argument sounds more like you mean Classical OO vs. Prototypal OO. JS is the latter, and it is just as Object-oriented as Classical; but fundamentally different because Prototypes are objects as well and can be changed at runtime.  To help people wrap their heads around prototypes, ES6 even introduced the keyword class (although I’m not a big fan of it).  In some ways, prototypes are a more true and powerful kind of OOP.  Classes were introduced after OOP landed and primarily to help with static-type checking, not to enable better OOP design.

In the end, JavaScript is multi-paradigm just like Python with a mix of Object-oriented, Functional, and Imperative paradigms.

6. No Unit Testing

Patently false. You will undoubtedly find many testing libraries of very high quality in JS.  And if you want to talk culture of testing in a community; in my experience, in a room with a Rubyist, Pythonista, and JavaScripter, the Python guy is the least likely to be writing tests.

Conclusion

JavaScript as a language, with all the warts and weirdness, is easily one of the fastest evolving languages in the world.

10 years ago, who would have thought this about JS:

  1. Most widely used programming language on earth
  2. One of the fastest scripting languages ever
  3. Largest package ecosystem ever (npm)
  4. Popular as a server backend language

I once shared your disdain of JavaScript, but with the recent incredible work being done on the language itself, it has become one of my favorites.

I’ll close with this slide by Brendan Eich, the creator of JavaScript:

Screenshot 2016-09-01 17.53.16

Defending JavaScript

Forms to Emails using AWS Lambda + API Gateway + SES

When deploying static websites, I am not a fan of provisioning servers to distribute them.  There are so many alternatives that are cheaper, simpler, and faster than managing a full backend server: S3 buckets, Content-Delivery Networks (CDNs), etc.  But the catch with getting rid of a server is now you don’t have a server anymore!  Without a server, where are you going to submit forms to?  Lucky for us, in a post-cloud world, we can solve this!

In this post, I will describe how AWS Lambda and API Gateway can be used as a “serverless” backend to a fully static website that can submit forms that get sent as emails to the site owner.

Important Note

This is merely a demonstration.  For simplicity, I do not explain important things like setting up HTTPS in API Gateway but I certainly recommend it.  Also, be careful applying this solution to other contexts.  Not all data can/should be treated like publicly submittable contact form data. Most applications will require more robust solutions with authentication and data stores. Be wise, what can I say more.

Prerequisites

  • AWS Account

The project is a simple static marketing website.  Like most business websites, it has a “Contact Us” page with a form that potential customers can fill out with their details and questions.  In this situation, we want this data to be emailed to the business so they can follow-up.  This means we need an endpoint to (1) receive data from this form and (2) send an email with the form contents.

Let’s start with the form:

<form id="contact-form">
  <label for="name-input">Name:</label>
  <input type="text" id="name-input" placeholder="name here..." />

  <label for="email-input">Email:</label>
  <input type="email" id="email-input" placeholder="email here..."/>

  <label for="description-input">How can we help you?</label>
  <textarea id="description-input" rows="3" placeholder="tell us..."></textarea>

  <button type="submit">Submit</button>
</form>

And because API Gateway is annoying to use with application/x-www-form-urlencoded data, we’re just going to us jQuery to grab all the form data and submit it as JSON because it will Just Work™:

var URL = '<api-gateway-stage-url>/contact'

$('#contact-form').submit(function (event) {
  event.preventDefault()

  var data = {
    name: $('#name-input').val(),
    email: $('#email-input').val(),
    description: $('#description-input').val()
  }

  $.ajax({
    type: 'POST',
    url: URL,
    dataType: 'json',
    contentType: 'application/json',
    data: JSON.stringify(data),
    success: function () {
      // clear form and show a success message
    },
    error: function () {
      // show an error message
    }
  })
})

Handling the success and error cases are left as an exercise to the reader 🙂

Lambda Function

Now lets get to the Lambda Function! Open up the AWS Console and navigate to the Lambda page and choose “Get Started Now” or “Create Function”:

Screenshot 2016-04-05 11.36.59

On the “Select Blueprint” page, search for the “hello-world” blueprint for Node.js (not python):

Screenshot 2016-04-05 11.39.42

Now, you create your function.  Choose the “Edit Code Inline” setting which will have a big editor box with some JavaScript code in it and replace that code with the following:

var AWS = require('aws-sdk')
var ses = new AWS.SES()

var RECEIVER = '$target-email$'
var SENDER = '$sender-email$'

exports.handler = function (event, context) {
    console.log('Received event:', event)
    sendEmail(event, function (err, data) {
        context.done(err, null)
    })
}

function sendEmail (event, done) {
    var params = {
        Destination: {
            ToAddresses: [
                RECEIVER
            ]
        },
        Message: {
            Body: {
                Text: {
                    Data: 'Name: ' + event.name + '\nEmail: ' + event.email + '\nDesc: ' + event.description,
                    Charset: 'UTF-8'
                }
            },
            Subject: {
                Data: 'Website Referral Form: ' + event.name,
                Charset: 'UTF-8'
            }
        },
        Source: SENDER
    }
    ses.sendEmail(params, done)
}

Replace the placeholders for RECEIVER and SENDER with real email addresses.

Give it a name and take the defaults for all the other settings except for Role* which is where we specify an IAM Role with the permissions the function will need to operate (logs and email sending). Select that and Basic execution role which should pop-up with an IAM role dialog. Take the defaults but open the “View Policy Document” and choose “Edit”. Change the value to the following:

{
    "Version":"2012-10-17",
    "Statement":[
      {
          "Effect":"Allow",
          "Action":[
              "logs:CreateLogGroup",
              "logs:CreateLogStream",
              "logs:PutLogEvents"
          ],
          "Resource":"arn:aws:logs:*:*:*"
      },
      {
          "Effect":"Allow",
          "Action":[
              "ses:SendEmail"
          ],
          "Resource":[
              "*"
          ]
      }
    ]
}

The first statement allows you to write logs to CloudWatch. The second statement lets you use the SES SendEmail API. With the IAM Role added, we will move to setting up the API Gateway so our Lambda function will be invoked by POST’s to an endpoint.

API Gateway Setup

The process for configuring API Gateway is as follows:

  1. Create an API
  2. Create a “Contact” resource
  3. Create a “POST” method that invokes our Lambda Function
  4. Enable CORS on our resource

Open up the API Gateway in the Console:

Screenshot 2016-04-05 11.56.05

Select the “Get Started” or “Create API” button.  Give the API a useful name and continue.

Now we will create a “Resource” and some “Methods” for our API.  I will not walk you through each step of the process because the GUI is a little tricky to explain, but the process is fairly straightforward.

Using the “Actions” dropdown, “Create Resource” name it something like “Contact” or “Message”.  Then, with the resource selected, use “Actions” to “Create Method”.  Choose a POST.  Now we will connect it to our Lambda Function:

Screenshot 2016-04-05 12.05.18

Once you save this, you will need to Enable CORS so that your code in the browser can POST to this other Domain.  Choose your resource > Actions > Enable CORS.

Screenshot 2016-04-05 12.07.32

Just to be safe, I added a header to Access-Control-Allow-Headers that I believe jQuery sends on AJAX calls.  Just put this at the end of the comma-separated list: x-requested-with. I am also using the ‘*’ so that it is easy for local testing. For Production, you should add the actual domain name you will be running your website under.

Now your resources and methods should look something like this:

Screenshot 2016-04-05 12.00.34

The last step is to “Deploy API”.  It’s not too bad.  Just click through the screens and fill them out with stuff that makes sense to you.  The high-level overview is that you need to create a “Stage” and then whenever you make updates to your API, you “deploy” to a “stage”.  This means that you can deploy the same API to multiple stages and test out any changes on a “Testing” stage and if things are good, deploy to the “Production” stage.

At the end of “deploying”, they will give you an “Invoke URL”.  This URL is the root of your API.  To make requests to a resource just add the name to the end of the URL: “https://invoke-url/stage-name/resource&#8221;.   To POST to our “Contact” (or “Message”) resource and given an Invoke URL of https://1111111.magic.amazonaws.com/testing, you will make POST requests to https://1111111.magic.amazonaws.com/testing/contact.  Put this URL into the jQuery code as the value of var URL.

SES + Email Validation

We are using SES to send emails.  For testing, it restricts the email addresses that can “send” and “receive” messages to ones that have been “verified”.  It is very simple.  Just go to the SES page of the Console, choose Email Addresses > Verify New Email Address.  Do this for each email address you would like to “send as” and “send to”.

Try it Out

This should get you most of the way.  If everything worked out, you should be able to submit your contact form and then receive an email with contents.

Post questions in the comments if you hit any problems.  This is only a summary and pare down of the process I went through.

Update

Jeff Richards (http://www.jrichards.ca/) recommended an all-in-one HTML + JavaScript snippet.  Here is a Github Gist of that snippet: https://gist.github.com/tgroshon/04b94aee6331bb65f05f4e0d7ff2e8bd

Forms to Emails using AWS Lambda + API Gateway + SES

Optimizing the Performance of a Node.js Function

Introduction

After letting it stagnate for awhile, I decided to rework Street.js to use the things I have been working with in Node.js this last year.  My main goals are as follows:

  • ES6ify the code base
  • Replace nasty callback code with Promises
  • Pass ESLint using JavaScript Standard Style config
  • Annotate types with Flow
  • Simplify the implementation

Node.js v4 has a lot of new ES6 features that are extremely helpful, fun, and performant.  I will be refactoring Street to use these new features.  Of course, like a good semver citizen, I will update my major version (to v1.0) when I publish the rewrite.

Setup

I am using Babel as a transpiler to paper over ES6 features not yet in Node.js, but blacklisting the transforms that are already present.  Many ES6 features (e.g. generators, symbols, maps, sets, arrow functions) are more performant natively than via transpilation and I do not care about supporting Node.js before version 4.  The following is my .babelrc configuration file showing the blacklist I am using:

{
  "blacklist": [
    "es3.memberExpressionLiterals",
    "es3.propertyLiterals",
    "es5.properties.mutators",
    "es6.blockScoping",
    "es6.classes",
    "es6.constants",
    "es6.arrowFunctions",
    "es6.spec.symbols",
    "es6.templateLiterals",
    "es6.literals",
    "regenerator"
  ],
  "optional": [
    "asyncToGenerator"
  ]
}

Case Study: Walking a Directory

In Street, I need to walk a directory of files so I can generate a manifest of file paths and their hashes for comparison to a previous manifest.  The directory walking code was hairy; most of it was from Stack Overflow.  Here’s the current state (cleaned up):

var fs = require('fs')
var path = require('path')

function oldFindFilePaths (dir, done) {
  var filePaths = []
  fs.readdir(dir, function(err, filelist) {
    if (err) return done(err)
    var i = 0

    ;(function next() {
      var file = filelist[i++]
      if (!file) return done(null, filePaths)

      file = path.join(dir, file)

      fs.stat(file, function(err, stat) {
        if (err) return done(err)
        if (stat && stat.isDirectory()) {
          _findFilePaths(file, function(err, res) {
            if (err) return done(err)
            filePaths = filePaths.concat(res)
            next()
          })
        } else {
          filePaths.push(file)
          next()
        }
      })
    })()
  })
}

I never really liked this because it is not intuitive to me. The function is an unwieldy set of multiple recursive calls that make me feel gross.  Once I got it working, I was wary of touching it again.

There must be a better way to do this! I can either spend some time refactoring this to make it nicer, or see if a rewrite is more elegant and perhaps performant. I am willing to take a small performance hit.

The following is my first iteration:

import fs from 'fs'
import path from 'path'

async function findFilePaths (dir: string): Promise<Array> {
  var foundPaths = []
  var files = fs.readdirSync(dir)

  while (files.length > 0) {
    let file = files.pop()
    if (!file) break

    let filePath = path.join(dir, file)
    let stat = fs.statSync(filePath)

    if (stat.isDirectory()) {
      foundPaths = foundPaths.concat(await findFilePaths(filePath))
    } else {
      foundPaths.push(filePath)
    }
  }

  return foundPaths
}

Do not be thrown off by the Type Annotations.  I really enjoy FlowType and find it useful for finding many kinds of bugs.  All those annotations get stripped during babel transpilation.

This function was much clearer. I love ES7 Async functions. They wrap a function’s logic in a Promise and then resolve with the returned value or reject if errors are thrown. Inside an Async Function, you can await on Promises. If they resolve, the value resolved with is returned. If they reject, the value (best if an error instance) rejected with is thrown.

Notice that I replaced my asynchronous fs calls with synchronous ones. The callbacks were just too nasty, and since this is a CLI application they were not that helpful for performance as I was using them.

This was much clearer, but still not ideal to me. I am not a fan of while loops and instead prefer a more functional approach using map, filter, and reduce when possible. Also, calling fs.readdirSync was ok in this usage, but those fs.statSync calls seemed inefficient as they would block on each call to a file descriptor. Perhaps I could make them async again but parallelize them.

This lead me to my next iteration:

async function newFindFilePaths2 (dir: string): Promise<Array<string>> {
  var files = await new Promise((resolve, reject) =>; {
    fs.readdir(dir, (err, files) =>; err ? reject(err) : resolve(files))
  })

  var statResultPromises = files.map(file =>; new Promise((resolve, reject) => {
    var filepath = path.join(dir, file)
    fs.stat(filepath, (err, stat) =>; err ? reject(err) : resolve({filepath, stat}))
  }))

  var results = await Promise.all(statResultPromises)
  var {subDirs, foundPaths} = results.reduce((memo, result) =>; {
    if (result.stat.isDirectory()) {
      memo.subDirs.push(result.filepath)
    } else {
      memo.foundPaths.push(result.filepath)
    }
    return memo
  }, {subDirs: [], foundPaths: []})

  var subDirPaths = await Promise.all(subDirs.map(findFilePaths2))
  return foundPaths.concat(...subDirPaths)
}

Notice the while loop is gone; replaced with map and reducefs.stat happen in parallel for a list of files. The fs.readdir call is also async because I will do recursive calls to this function in parallel for all subdirectories I find.

I am also a fan of destructuring and spreading. It makes for more concise and elegant code. My favorite example here is taking the results of recursive calls to findFilePaths2, which are arrays of strings, and then spreading them into arguments to the foundPaths.concat function call to join all the paths into a single array.

This is excellent, but can be cleaned up and broken into a few different functions. This brings me to my last iteration:

function listFiles (dir) {
  return new Promise((resolve, reject) => {
    fs.readdir(dir,
               (err, files) => err ? reject(err) : resolve(files))
  })
}

function getStatMapFn (dir) {
  return file => new Promise((resolve, reject) => {
    var filepath = path.join(dir, file)
    fs.stat(filepath,
            (err, stat) => err ? reject(err) : resolve({filepath, stat}))
  })
}

function partitionByType (memo, result) {
  if (result.stat.isDirectory()) {
    memo.subDirs.push(result.filepath)
  } else {
    memo.foundPaths.push(result.filepath)
  }
  return memo
}

async function newFindFilePaths3 (dir: string): Promise<Array<string>> {
  var files = await listFiles(dir)
  var results = await Promise.all(files.map(getStatMapFn(dir)))
  var {subDirs, foundPaths} = results.reduce(partitionByType,
                                             {subDirs: [], foundPaths: []})

  var subDirPaths = await Promise.all(subDirs.map(findFilePaths3))
  return foundPaths.concat(...subDirPaths)
}

Even though it is more lines of code, I prefer this to the previous. A few pure, helper functions and one function that puts them all together concisely and elegantly. So beautiful!

Running Times Compared

Lets check the execution time to see if we did any better.  This is just meant as a dirty comparison, not super scientific.

Function Execution Time (11 files, 2 dirs)
oldFindFilePaths  (callback hell) 1.8 ms
newFindFilePaths  (while loop) 12.1 ms
newFindFilePaths2  (map/reduce) 13.3 ms
newFindFilePaths3 (final map/reduce) 13.4 ms

Darn! The old function appears to be the most performant with a small number of files.  The difference between my last two iterations is negligible which makes sense because they are really the same thing just refactored slightly.

But what happens when there are more files and subdirectories?

Function Execution Time (11 files, 2 dirs) Execution Time (1300 files, 200 dirs) Execution Time (10800 files, 2400 dirs)
oldFindFilePaths  (callback hell) 1.8 ms 41.8 ms 269.6 ms
newFindFilePaths  (while loop) 12.1 ms 36.9 ms 182.6 ms
newFindFilePaths2  (map/reduce) 13.3 ms 60.8 ms 413.8 ms
newFindFilePaths3 (final map/reduce) 13.4 ms 61.5 ms 416.1 ms

Interesting!  The synchronous while loop started beating all the cases once files started to be in the 1000s spread over 100s of subdirectories.

Conclusion

I think I will probably end up going with the while loop function because it is the simplest and has better performance at scale.  And in the end, I mainly just wanted something with a simple API that I could hide behind a promise.

My theory of its superior performance over large directories is that the synchronous file system queries act like a kind of naive throttling system; it stops the VM from making thousands of concurrent function calls and file system queries which would bog it down.  That’s just my intuition though.

Optimizing the Performance of a Node.js Function

Header Files, Compilers, and Static Type Checks

Have you ever thought to yourself, “why does C++ have header files”?  I had never thought about it much until recently and decided to do some research into why some languages (C, C++, Objective C etc.) use header files but other languages do not (e.g. C# and Java).

Header files, in case you do not have much experience with them, are where you put declarations and definitions.  You declare constants, function signatures, type definitions (like structs) etc.  In C, all these declarations go into a .h file and then you put the implementation of your functions in .c files.

Here’s an example of a header file called mainproj.h:

#ifndef MAINPROJ_H__
#define MAINPROJ_H__

extern const char const *one_hit_wonder;

void MyFN( int left, int back, int right );

Here is a corresponding source file mainproj.c:

#include "mainproj.h"

const char const *one_hit_wonder = "Yazz";

void MyFN( int left, int back, int right )
{
    printf( "The only way is up, baby\n" );
}

Notice that the header only has the function definition for MyFN and it also does not specify what one_hit_wonder is set to. But why do we do this in C but not in Java?  Both are compiled and statically typed.  Ask GOOGLE!

A great MSDN blog post by Eric Lippert called “How Many Passes” was very helpful.  The main idea I got out of the article is that header files are necessary because of Static Typing.  To enforce type checks, the compiler needs to know things like function signatures to guarantee functions never get called with the wrong argument types.

Eric lists two reasons for header files:

  1. Compilers can be designed to do a single pass over the source code instead of multiple passes.
  2. Programmers can compile a single source file instead of all the files.

Single Pass Compilation

In a language like C#, which is statically typed but has no header files, the compiler needs to run over all the source code once to collect declarations and function signatures and then a second time to actually compile the function bodies (where all the real work of a program happens) using the declarations it knows about to do type checks.

It makes sense to me that C and C++ would have header files because they are quite old languages and the CPU and Memory resources required to do multiple passes in this way would be very expensive on computers of that era.  Nowadays, computers have more resources and the process is less of a problem.

Single file compilation

One interesting other benefit of header files though is that a programmer can compile a single file.  Java and C# can not do that: compilation occurs at the project level, not the file level.  So if a single file is changed, all files must be re-compiled.  That makes sense because the compiler needs to check every file in order to get the declarations.  In languages with header files, you can only compile the file that changed because you have header files to guarantee type checks between files.

Relevance Today

Interesting as this may be, is it relevant today if you only do Java, C#, or a dynamic language?  Actually it does!

For instance, consider TypeScript and Flow which both bring gradual typing to JavaScript. Both systems have a concept of Declaration files.  What do they do?  You guessed it!  Type declarations, function signatures, etc.

TypeScript Declaration file:

module Zoo {
  function fooFn(bar: string): void;
}

Flow Declaration file:

declare module Zoo {
  declare function fooFn(bar: string): void;
}

To me, these look an awful lot like header files!

As we see, header files are not dead!  They are alive and well in many strategies for Type Checking.

Header Files, Compilers, and Static Type Checks