ПИТОН ОБЪЕКТЫ: ИЗМЕНЧИВЫЙ VS. НЕИЗМЕННЫЙ

Note: this is a Russian language translation of the following English post: Python Objects: Mutable vs. Immutable. Thank you to my friends Andrei Rybin and Dimitri Kozlenko for their help in translating!

Не все объекты в Питон обрабатывают изменения одинаково. Некоторые объекты являются изменяемыми, то есть они могут быть изменены. Другие неизменны; они не могут быть изменены, а возвращают новые объекты при попытке обновить. Что это значит при написании кода в Питон?

Этот пост будет об (а) мутабильность общих типов данных и (б) случаях, когда вопрос переменчивости важен.

Изменчивость по типовым разновидностям

Ниже приведены некоторые неизменяемые объекты:

  • int
  • float
  • decimal
  • complex
  • bool
  • string
  • tuple
  • range
  • frozenset
  • bytes

Ниже приведены некоторые изменяемые объекты:

  • list
  • dict
  • set
  • bytearray
  • определяемые пользователем классы (если это специально не сделано неизменяемым)

То, что помогает мне помнить, какие типы изменчивы а какие нет, это что контейнеры и определяемые пользователем типы, как правило, изменяемые, в то время, как скалярные типы почти всегда неизменны. Потом я вспоминаю некоторые заметные исключения: tuple является неизменяемым контейнером, frozenset неизменяемая версия set. Строки неизменны; что если вы хотите, чтобы можно было изменять chars в определенном индексе? Используйте bytearray.

КОГДА мутабильность ВАЖНА

Изменчивость может показаться безобидной темой, но при написании эффективной программы ее необходимо понимать. Например, следующий код является простым решением для складывания строк вместе:

string_build = ""
for data in container:
    string_build += str(data)

На самом деле, это очень неэффективно. Поскольку строки являются неизменяемыми, складывание двух строк вместе фактически создает третью строку, которая является комбинацией двух предыдущих. Если вы перебираете много и строите большую строку, вы будете тратить много памяти на создание и удаление объектов. Кроме того, в конце итерации вы будете выделять и выбрасывать очень большие объекты, что является еще более дорогостоящим.

Ниже приводится более эффективный код в стиле Питона::

builder_list = []
for data in container:
    builder_list.append(str(data))
"".join(builder_list)

### Another way is to use a list comprehension
"".join([str(data) for data in container])

### or use the map function
"".join(map(str, container))

Этот код использует преимущества изменяемости одного объекта LIST, чтобы собрать свои данные вместе, а затем выделить результат в одну строку.. Это сокращает общее число выделяемых объектов почти вдвое.

Еще один подводный камень, связанные с изменчивостью является следующий сценарий:

def my_function(param=[]):
    param.append("thing")
    return param

my_function() # ["thing"]
my_function() # ["thing", "thing"]

То, что вы могли бы подумать, что произойдет в том, что, давая пустой LIST в качестве значения по умолчанию из параметров, это то, что новый пустой LIST будет выделятся каждый раз, когда функция вызывается и ни один LIST не передается. Но что на самом деле происходит, что каждый вызов, который использует LIST по умолчанию будет использовать один и тот же LIST. Это происходит потому, что Python (а) оценивает функции и сигнатуры только один раз (б) оценивает аргументы по умолчанию как часть определения функции, и (с) выделяет один изменяемый LIST для каждого вызова этой функции.

Не ставьте изменяемый объект в качестве значения по умолчанию для параметра функции. Неизменные типы совершенно безопасны. Если вы хотите получить желаемый эффект, сделайте это:

def my_function2(param=None):
    if param is None:
        param = []
    param.append("thing")
    return param

ВЫВОД

Изменчивость важна. Знайте ее Выучите ее. Примитивные типы, скорее всего, неизменны. Контейнерные типы, скорее всего, изменчивые.

References

Advertisements
ПИТОН ОБЪЕКТЫ: ИЗМЕНЧИВЫЙ VS. НЕИЗМЕННЫЙ

Implementing a Mini-React-Redux Framework on a Django Page

Introduction

I have built several production web applications using React and Redux and generally have had an excellent experience with those technologies.  One of React’s greatest assets IMO is it’s ability to integrate into all kinds of stacks and setups but still play nice with the other kids.  That was something that impressed me back in Spring 2014 when I first used React.  We got React running in the jQuery spaghetti code of a massive, legacy Ruby on Rails application with incredibly little effort and huge productivity benefits to the team.  Redux is also incredible for the amount of good it does you with so little code.

There are lot’s of blogs and tutorials on how to build a full single-page application (SPA) complete with client-side routing, persistent state, and even server-side rendering to boost that time-to-interactivity metric.  What if I don’t need that?  What if I already have a site built using an “old-school” server-side framework like Ruby on Rails or Django, but I have one specific page that should be highly interactive and need something more robust than simple jQuery?  React and Redux could still be hugely beneficial, but how do I do it without (a) getting bogged down in boilerplate or (b) over-engineering the solution?

Mini React-Redux Framework to the rescue!

Ready, Set, Go!

Let’s make the skeleton of a super, tiny JavaScript framework that can fit our use case for a Django website.

Here are the steps we’ll follow:

  1. Install our client dependencies
  2. Setup Webpack with Django
  3. Implement the Mini React-Redux Framework

Install our client dependencies

The following are the NPM dependencies I am relying on:

{
  "dependencies": {
    "babel-core": "~6.3.26",
    "babel-loader": "~6.2.0",
    "babel-preset-es2015": "~6.3.13",
    "babel-preset-react": "~6.16.0",
    "react": "~15.4.2",
    "react-dom": "~15.4.2",
    "redux": "~3.6.0",
    "redux-logger": "~2.7.4",
    "redux-thunk": "~2.2.0",
    "webpack": "~1.13.2",
    "webpack-bundle-tracker": "0.0.93"
  }
}

Include these dependencies in your package.json and run npm install.

Setup Webpack with Django

For this step, we are going to use the django-webpack-loader tool to give us the power to load Webpack bundles onto a templated page.  The setup is very simple if you have a vanilla Django application; just follow the loader tutorial.  If you are using Django-Mako-Plus add-on, supplement the regular loader tutorial with my own little tutorial.  I will give a high-level overview.

We need a webpack.config.js. Here is a pre-v2 webpack config file that we can use:

var path = require('path');
var webpack = require('webpack');
var BundleTracker = require('webpack-bundle-tracker');

module.exports = {
  context: __dirname,
  entry: {
    myapp: './client/myapp/index',
  },
  output: {
      // IMPORTANT: Need to match this up with settings STATICFILES_DIRS and WEBPACK_LOADER
      path: path.resolve('./static/bundles/'),
      filename: "[name]-[hash].js",
      // OPTIONAL: In this setup, it can be helpful to namespace the exported files
      library: 'MyCompanyApp',
      libraryTarget: 'var'
  },

  plugins: [
    // IMPORTANT: django-webpack-loader needs to know where this file is
    new BundleTracker({filename: './webpack-stats.json'})
  ],

  module: {
    loaders: [
      {
        test: /\.js$/,
        exclude: /node_modules/,
        loader: 'babel-loader',
        query: {
          plugins: ['transform-runtime'],
          presets: ['es2015', 'react', 'stage-2'],
          cacheDirectory: true
        }
      }
    ]
  }
}

You will need to throw in some settings for the Django webpack loader plugin so it knows where to find certain key files. Here are some simple defaults:

INSTALLED_APPS = (
    # ...
    'webpack_loader'
)

STATICFILES_DIRS = (
    os.path.join(BASE_DIR, 'static'),
)

WEBPACK_LOADER = {
    'DEFAULT': {
        # Lets not cache for dev builds; you can enable for prod builds
        'CACHE': False,
        # NOTE: where, inside the staticfiles, are the output? Must end with slash
        'BUNDLE_DIR_NAME': 'bundles/',
        # NOTE: STATS_FILE is the path to the file that the BundleTracker webpack plugin is writing.
        'STATS_FILE': os.path.join(BASE_DIR, 'webpack-stats.json'),
        'POLL_INTERVAL': 0.1,
        'TIMEOUT': None,
        'IGNORE': ['.+\.hot-update.js', '.+\.map']
    }
}

That should do for now.

Implement the Mini React-Redux Framework

Dan Abramov is a smart guy. He wrote Redux. He encourages devs not to use Redux until you know that you need it; just use Props and State. I strongly support that! However, in this post I want to demonstrate the more complicated case of using Redux including with some middlewares just to show how simple it is. I encourage you to pair this example down to only what you need.

Here is the source I came up with for our mini framework:

import React from 'react'
import ReactDOM from 'react-dom'
import { createStore, applyMiddleware, compose } from 'redux'
import thunk from 'redux-thunk'
import createLogger from 'redux-logger'
import MyComponent from './components/MyComponent'

/**
 * Redux Reducer.
 * @params:
 *  - state: the previous state of the store
 *  - action: an object describing how the state should change
 * @returns:
 *  - state: a new state after apply appropriate changes
 */
const rootReducer = (state = { clicks: 0 }, action) => {
  // ... change state based on action
  return state
}

/**
 * Redux Store object with three functions you should care about:
 *  - getState(): returns the current state of the store
 *  - dispatch(action): calls the reducer with a given action
 *  - subscribe(): called after a reducer runs
 *
 * The store has two optional middlewares to showcase how you would add them:
 *  - redux-thunk: allows `store.dispatch()` to receive a thunk (function) or an object
 *                 See http://stackoverflow.com/questions/35411423/how-to-dispatch-a-redux-action-with-a-timeout/35415559#35415559
 *  - redux-logger: logs out redux store changes to the console. Only in dev.
 */
const middlewares = process.env.NODE_ENV === 'production'
    ? applyMiddleware(thunk)
    : applyMiddleware(thunk, createLogger())
let store = compose(middlewares)(createStore)(rootReducer)

/**
 * Helper function to render the Gradebook component to the DOM.
 * Makes the following props available to the Gradebook component:
 *  - storeState: an object of the latest state of the redux store.
 *  - dispatch: a function that dispatches actions to the store/reducer.
 */
const render = (nodeId, component) => {
  let node = document.getElementById(nodeId)
  ReactDOM.render(<component storeState={store.getState()} dispatch={store.dispatch} />, node)
}

/**
 * Function that bootstraps the app.
 *  - render the component with initial store state.
 *  - re-render the component when the store changes.
 */
const start = () => {
  render('app', MyComponent)
  store.subscribe(() => render('app', MyComponent))
}

To start the application, (a) load the bundle on the page and (b) call the exported start function when the page loads. Here’s an example using jQuery:

{% load render_bundle from webpack_loader %}
{% render_bundle 'myapp' %}

<script>$(() => MyCompanyApp.start())</script>

Explanation

This little proof-of-concept is interesting to me because of how much usefulness it provides with so little code.  With this code, we create a Redux store with some basic middlewares and a reduce that does nothing interesting (yet).  Then we render a component to the DOM, giving it the current store state and a function for the component to dispatch actions if necessary, and setting up a store subscription so that the component will be re-rendered whenever the store changes.

Another cool part about this approach is that a lot of the setup code can be pulled out and made reusable.  The render(), start(), and store setup would probably be the same for every Mini App we would create.  Then we could simplify this whole file down to just the reducer and passing in the node and component to the start function (not implemented here).

Conclusion

With very little effort and Boilerplate, we have a React application using Redux as it’s storage system.  With this in place, you can build quite sophisticated widgets and still have the flexibility to get more complex if you need to do something more involved.

Implementing a Mini-React-Redux Framework on a Django Page

Adding Webpack Bundles to your Django-Mako-Plus (DMP) Site

This post describes how to hook up Webpack to a Django site using the django-webpack-loader tool in the special case where your Django site is running the Django-Mako-Plus (DMP) library.

Why Webpack?

In the last few years, the ecosystem of JavaScript build tools has grown in both size and quality.  One of my favorite build tools is Webpack.  If you have not heard of it, I highly recommend it to you for bundling your JavaScript, CSS, and other static assets.  To get the most out of this post, please go do a little cursory research on the use case of the webpack bundler before continuing on here.

I also appreciate the Django framework for building dynamic web applications in Python. If you would like to use Django with Webpack, it takes a little extra work to get things hooked together in a clean, scalable way.  Webpack outputs “bundles” that can be formatted in many ways (Common.js, UMD, Require.js, etc.) depending on how they should be consumed and can even output the bundles with the md5 hash in the name to improve the caching of your bundles on the internet.

What is “django-webpack-loader”?

Django, for all its great features, handles static files poorly by modern standards which is where the django-webpack-loader (hereafter referred to as “the loader”) tool comes in.  It provides a way to load a webpack bundle by name into a Django template by mapping the webpack bundle’s “logical name” (e.g. main) to it’s filename (e.g. main-be0da5014701b07168fd.js) which filename changes whenever the contents of the bundle change.  To learn how the loader works, read the documentation and tutorial.

DMP with The Loader

The loader integrates with the templating system of Django.  If you are using Django-Mako-Plus (DMP), you replaced the default templating engine with Mako so the prepared require_bundle helper is not available anymore.  Lucky for us, Mako is so powerful that we can import python functions with ease.  All we need to do in a template is import the right function and call it using Mako syntax:

<%! from webpack_loader.templatetags.webpack_loader import render_bundle %>
<html>
  <head> 
    ${ render_bundle('main') }
  ...

Simple! We can even simplify this a bit by adding the import statement as a DEFAULT_TEMPLATE_IMPORTS for our Mako templates like so:

TEMPLATES = [
  {
    'BACKEND': 'django_mako_plus.MakoTemplates',
    'OPTIONS': {
      # Import these names into every template by default
      # so you don't have to import them explicitly
      'DEFAULT_TEMPLATE_IMPORTS': [
        'from webpack_loader.templatetags.webpack_loader import render_bundle',
      ]
    }
  }
]

BAM!

Conclusion

All done!  You are now ready to start using the django-webpack-loader to include Webpack bundles in your Django-Mako-Plus website!

 

Adding Webpack Bundles to your Django-Mako-Plus (DMP) Site

Deploying to Production with Git

On your last day at any job, it is fun to go and change a bunch of things and then leave it all with your colleagues and say “Peace Out!” One thing you could do is rewrite/rework the project’s build and deploy process. Here is a way you could do it using Git with a Django project using Nginx and UWSGI (and yes, I did this all on my last day :).

The process is (1) automate our build process with a Makefile, (2) setup a Git repo on the live server to push to, and (3) use a Git hook to automatically call our Makefile targets.

Build with a Makefile

First things first, lets write a Makefile because they are very helpful. You can replace a Makefile with some other script or set of scripts, but I find Makefiles to be a very good idea on Unix based systems. You just need something to automate the your build process.

Here is an excerpt that is pretty similar to the Makefile I used:

# Makefile for building and deploying
#

UWSGI=/etc/init.d/uwsgi
NGINX=/etc/init.d/nginx

deploy: dependencies clean minified_static_files

restart: $(UWSGI) $(NGINX)
	$(UWSGI) restart
	$(NGINX) restart

stop: $(UWSGI) $(NGINX)
	$(NGINX) stop
	$(UWSGI) stop

dependencies: dependencies.pip
	pip install -r dependencies.pip # Or requirements.txt

resources:
	python resources_build.py # Minifies static files.

minified_static_files: resources
	python manage.py collectstatic # Collect into static_files/

clean:
	@-rm -rf static_files/
	@-find . -name '__pycache__' -exec /bin/rm -rf {} ;
	@echo 'Successfully Cleaned!'

.PHONY: clean resources dependencies restart stop deploy

You can put any useful commands that you run often in your project during development or when deploying to production. Put this in the root of your project directory or anywhere else in your Git project so you will not lose it.

Setup Production Git Repo

Lets use a Bare Git Repository. Log in to your production server, create a new directory, and initialize it as a bare git repository.

mkdir prod-repo
cd prod-repo
git init --bare

You will need to add this repository to your Git remotes on your local machine. The command looks something like this:

git remote add production ssh://username@www.yourserver.com:PORT/path/to/prod-repo

FYI: The PORT is whatever port you run your ssh server on.

Post-Receive Git Hook

Now lets setup a post-receive git hook on the production bare git repository that will call your makefile (or other automatic script) once a push has been received.

vim prod-repo/hooks/post-receive

A git hook file is any executable script, so you can write it in bash, sh, python, ruby, etc. Lets keep it simple and use sh.

#!/bin/sh
#
# SOURCE_DEST is whatever directory you have configured
# uwsgi look for your app in.  This is where Git will put
# the new source files that you push to this repo.

# Variables
SOURCE_DEST=/path/to/source
GIT_DIR=/path/to/prod-repo

# Update the HEAD to latest commit
git --work-tree=$SOURCE_DEST --git-dir=$GIT_DIR checkout -f

cd $SOURCE_DEST

# Run make targets
make deploy
make restart

# Fix permissions for Code
chown -R www-user $SOURCE_DEST
chgrp -R www-user $SOURCE_DEST

Putting it all Together

To update production, just issue a git push command to your production remote:

git push production master

This will push your changes on git and run your post-receive git hook script which calls your Makefile targets. Customize this to fit your needs. You can easily add in targets to run database migrations, compile coffeescript, pre-process CSS for SASS or Less, run unit tests, etc. The sky is the limit. It would also be a good idea to use a git tag each time before you push to production. Consider using some client-side git hook to accomplish that 🙂

Deploying to Production with Git

Prepping Yelp Data for Mining

In April 2014, my Data Mining project team at BYU began work on our semester project which we chose from kaggle.com. The project was based on data from an old Yelp Business Ratings Prediction Contest which finished in August 2013. Over several posts, we will take a look at some of the interesting things our team did in this project.

Problem Description

As explained on the Kaggle web page for the Yelp Contest, the goal of the project was to predict the number of stars a user would rate a new business using user data, business data, checkin data, and past reviews data. The prediction model/algorithm would then become the heart of a recommender system.

Data Description

The dataset is four files holding json objects. In this case, each line of the files is a distinct JSON object, which means we will parse each line into JSON as we go. The following is a description of the file yelp_training_set_business.json:

{
  'type': 'business',
  'business_id': (encrypted business id),
  'name': (business name),
  'neighborhoods': [(hood names)],
  'full_address': (localized address),
  'city': (city),
  'state': (state),
  'latitude': latitude,
  'longitude': longitude,
  'stars': (star rating, rounded to half-stars),
  'review_count': review count,
  'categories': [(localized category names)]
  'open': True / False (corresponds to permanently closed, not business hours),
}

Extracting, Transforming, and Loading the Data

I heard a lot on the web about this concept called ETL–Extract, Transform, Load–but in my data mining and machine learning training I have heard it a grand total of ONE times. Well, I guess this is our ETL section where we “Extract” the data from the files, “Transform” the data into meaningful representations, and “Load” the it into our database.

So here’s how we implemented it. I wrote a python script to parse through the JSON files and load them into a MySQL database. By using a relational database, we could combine, aggregate, and output the data in countless ways that we could input into our learning algorithm.

For your convenience, I put my code and a subset of the data on github so you can explore them yourself: https://github.com/tgroshon/yelp-data-mining-loader. I used Peewee, a lightweight ORM compatible with MySQL (amongst others), to create the database and load the data. Here is an excerpt of the Business Model from models.py:

class Business(peewee.Model):
    """Business Object Table."""

    bid = peewee.PrimaryKeyField()
    business_id = peewee.CharField()  # encrypted ID with letters, numbers, and symbols
    name = peewee.CharField()
    full_address = peewee.CharField()
    city = peewee.CharField()
    state = peewee.CharField(max_length=2)  # AZ
    latitude = peewee.CharField()
    longitude = peewee.CharField()
    stars = peewee.DecimalField()  # star rating rounded to half-stars
    review_count = peewee.BigIntegerField()
    is_open = peewee.BooleanField()

    class Meta:
        database = db

Not too strange. This was the most complex Model and covers most of the bases for creating a Model class (which maps to a Database Table) with Peewee. Now lets take a look at the Load script itself: json_to_mysql.py. First, we need to read each json file, parse the input into JSON objects, and map the JSON objects to Models for saving. Here is the function for reading the lines of a file, parsing them to JSON, and yielding the JSON:

def iterate_file(model_name, shortcircuit=True, status_frequency=500):
    i = 0
    jsonfilename = "json/yelp_training_set_%s.json" % model_name.lower()
    with open(jsonfilename) as jfile:
        for line in jfile:
            i += 1
            yield json.loads(line)
            if i % status_frequency == 0:
                print("Status >>> %s: %d" % (jsonfilename, i))
            if shortcircuit and i == 10:
                raise StopIteration()

The file takes the name of a model, opens the corresponding JSON data file for that model, iterates each line in that file, and yields up the parsed JSON object to the caller. This allows the function to be called as an iterator like this excerpt from the save_businesses() function:

def save_businesses():
    for bdata in iterate_file("business", shortcircuit=False):
        business = Business()
        business.business_id = bdata['business_id']
        business.name = bdata['name']
        business.full_address = bdata['full_address']
        business.city = bdata['city']
        business.state = bdata['state']
        business.latitude = bdata['latitude']
        business.longitude = bdata['longitude']
        business.stars = decimal.Decimal(bdata.get('stars', 0))
        business.review_count = int(bdata['review_count'])
        business.is_open = True if bdata['open'] == "True" else False
        business.save()

Straightforward no? Each Model has some corresponding function to handle creating a Model, assigning appropiate data, and saving it to the database. Then at the end we have a spot calling each function when the script is run:

if __name__ == "__main__":
    reset_database()

    save_businesses()
    save_users()
    save_checkins()
    save_review()

The function reset_database() is important to create the tables for your Models. It looks like this:

def reset_database():
    tables = (Business, Review, User, Checkin, Neighborhood, Category,)
    for table in tables:
        # Nuke the Tables
        try:
            table.drop_table()
        except OperationalError:
            pass
        # Create the Tables
        try:
            table.create_table()
        except OperationalError:
            pass

Conclusion

And that is the gist on ETL (I guess?) for this system. The funny part is I think this data came from a SQL database and was converted to JSON for distribution in this tournament. Now that we have undone all their work, we can start work of our own in Part 2!

Prepping Yelp Data for Mining

Python Generator Expressions

In Python, you can do a list comprehension (short list builder) which is really efficient:

[ val for val in biglist ]

You can do the same thing with dictionaries:

{ x.val: x.val2 for x in biglist }

The Feature that Changed My Life

You can take that inner statement “val for val in biglist” and package it up into a reusable generator expression!

mygenerator = ( val for val in someiterable )

Then, you can pass that generator around to other functions. Each call will run the generator once, return a value, and then wait until it is told to iterate again. The crazy part is, I knew that this is how the yield statement worked in Python to create generator functions (functions using yield return a generator). I had never thought of this flip side of generators functions packaged up all nice and neatly.

Continue reading “Python Generator Expressions”

Python Generator Expressions

Optimizing and Refactoring a Python Function

Revisiting old problems/solutions can be very instructive. Because Python is so quick to write and easy to read, you can refactor with minimal effort. You should not be afraid to throw away or rewrite entire functions or scripts. The following is a little story of when I refactored and refactored a function I knew could be better.

Continue reading “Optimizing and Refactoring a Python Function”

Optimizing and Refactoring a Python Function