PostgreSQL Character Data Types

PostgreSQL is a powerful, open-source, relational Database Management System (did that have enough adjectives for you? :). Until spring 2013, I had only been on projects using MySQL (they were all Java and PHP projects FYI). That spring, I started with a two projects that used PostgreSQL (one ruby and one python FYI). Ever since the switch, I can emphatically say that I love PostgreSQL! It does so many things right.

One interesting thing I just found out today at work is how PostgreSQL handles Character data types: char, varchar, and text. In PostgreSQL, they are basically the same data type! To be more clear, they actually use the exact same underlying C struct named varlena. Here she is:

 /* ----------------
 * Variable-length datatypes all share the 'struct varlena' header.
 * [omited notes]
 */
struct varlena
{
 char vl_len_[4]; /* Do not touch this field directly! */
 char vl_dat[1];
};

#define VARHDRSZ ((int32) sizeof(int32))

/*
 * These widely-used datatypes are just a varlena header and the data bytes.
 * There is no terminating null or anything like that --- the data length is
 * always VARSIZE(ptr) - VARHDRSZ.
 */
typedef struct varlena bytea;
typedef struct varlena text;
typedef struct varlena BpChar; /* blank-padded char, ie SQL char(n) */
typedef struct varlena VarChar; /* var-length char, ie SQL varchar(n) */

The main differences between the three is the validation: char gets padded to fill its limit, varchar is not padded but has a limit, text has no limit and no padding. They basically have the same performance as well. Interestingly, the text data type often ends up being the most efficient because it does not need to check sizes when putting or pulling data and if you switch all your char fields to text fields you will probably use storage.

Don’t believe me? Check the documentation!

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

So basically, we switched every field in our database that was holding a string to text fields. If you want to see some performance comparisons between the three types, check out this article. Interesting stuff!

Advertisements
PostgreSQL Character Data Types