Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Tuning PostgreSQL with pgbench

When it comes to performance tuning an environment, often the first place to start is
with the database. The reason for this is that most applications rely very heavily on a
database of some sort.

Unfortunately, databases can be one of the most complex areas to tune. The reason I say
that is because tuning a database service properly often involves tuning more than the
database service itself; it often requires making hardware, OS, or even application
modifications.

On top of requiring a diverse skill set, one of the biggest challenges with tuning a
database is creating enough simulated database traffic to stress the database service.
Which is why today’s article will explore pgbench, a benchmarking tool used to measure
performance of a PostgreSQL instance.

PostgreSQL is a highly popular open-source relational database. One of the nice things
about PostgreSQL is that there are quite a few tools that have been created to assist with
the management of PostgreSQL; pgbench is one such tool.

While exploring pgbench, we will also use it to measure the performance gains/loss for a
common PostgreSQL tunable.

Setting Up a PostgreSQL Instance


Before we can use pgbench to tune a database service, we must first stand up that
database service. The below steps will outline how to set up a basic PostgreSQL instance
on an Ubuntu 16.04 server.
Installing with apt-get
Installing PostgreSQL on an Ubuntu system is fairly easy. The bulk of the work is
accomplished by simply running the apt-get command.

# apt-get install postgresql postgresql-contrib

The above apt-get command installs both the postgresql and postgresql-


contrib packages. The postgresql package installs the base PostgreSQL service.
The postgresql-contrib package installs additional contributions to PostgreSQL.
These contributions have not yet been added to the official package but often provide
quite a bit of functionality.

With the packages installed, we now have a running PostgreSQL instance. We can verify
this by using the systemctl command to check the status of PostgreSQL.

# systemctl status postgresql


● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor
preset: enabled)
Active: active (exited) since Mon 2017-01-02 21:14:36 UTC; 7h ago
Process: 16075 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 16075 (code=exited, status=0/SUCCESS)

Jan 02 21:14:36 ubuntu-xenial systemd[1]: Starting PostgreSQL RDBMS...


Jan 02 21:14:36 ubuntu-xenial systemd[1]: Started PostgreSQL RDBMS.

The above indicates our instance started without any issues. We can now move on to our
next step, creating a database.

Creating a database
When we installed the postgresql package, this package included the creation of a user
named postgres. This user is used as the owner of the running instance. It also serves as
the admin user for the PostgreSQL service.

In order to create a database, we will need to login to this user, which is accomplished by
executing the su command.

# su - postgres

Once switched to the postgres user, we can log in to the running instance by using the
PostgreSQL client, psql.

$ psql
psql (9.5.5)
Type "help" for help.

postgres=#

After executing the psql command, we were dropped into PostgreSQL’s command line


environment. From here, we can issue SQL statements or use special client commands to
perform actions.
As an example, we can list the current databases by issuing the \list command.

postgres-# \list
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+----------+----------+-------------+-------------
+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
(3 rows)

After issuing the \list command, three databases were returned. These are default
databases that were set up during the initial installation process.

For our testing today, we will be creating a new database. Let’s go ahead and create that
database, naming it example. We can do so by issuing the following SQL statement:

CREATE DATABASE example;

Once executed, we can validate that the database has been created by issuing
the \list command again.

postgres=# \list
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+----------+----------+-------------+-------------
+-----------------------
example | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
(4 rows)
At this point, we now have an empty database named example. From this point, we will
need to return to our bash shell to execute pgbench commands. We can do this by
issuing the \q (quit) command.

postgres-# \q

Once logged out of the PostgreSQL command line environment, we can get started
using pgbench to benchmark our database instance’s performance.

Using pgbench to Measure Performance


One of the most difficult things in measuring database performance is generating enough
load. A popular option is to simply bombard test instances of the target application/s with
test transactions. While this is a useful test that provides DB performance in relation to
the application, it can be problematic sometimes as application bottlenecks can limit
database testing.

For situations such as this, tools like pgbench come in handy. With pgbench, you can
either use a sample database provided with pgbench or have pgbench run custom
queries against an application database.

In this article, we will be using the example database that comes with pgbench.

Setting up the pgbench sample database

The set up of the sample database is quite easy and fairly quick. We can start this process
by executing pgbench with the -i (initialize) option.

$ pgbench -i -s 50 example
creating tables...
5000000 of 5000000 tuples (100%) done (elapsed 5.33 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

In the command above, we executed pgbench with the -i option and the -s option


followed by the database name (example).
The -i (initialize) option will tell pgbench to initialize the database specified. What this
means is that pgbench will create the following tables within the example database.

table # of rows
---------------------------------
pgbench_branches 1
pgbench_tellers 10
pgbench_accounts 100000
pgbench_history 0

By default, pgbench will create the tables above with the number of rows shown above.
This creates a simple 16MB database.

Since we will be using pgbench to measure changes in performance, a


small 16MB database will not be enough to stress our instance. This is where the -
s (scaling) option comes into play.

The -s option is used to multiply the number of rows entered into each table. In the
command above, we entered a “scaling” option of 50. This told pgbench to create a
database with 50 times the default size.

What this means is our pgbench_accounts table now has 5,000,000 records. It also


means our database size is now 800MB (50 x 16MB).

To verify that our tables have been created successfully, let’s go ahead and run
the psql client again.

$ psql -d example
psql (9.5.5)
Type "help" for help.

example=#

In the command above, we used the -d (database) flag to tell psql to not only connect to
the PostgreSQL service but to also switch to the example database.

Since we are currently using the example database, we can issue the \dt command to list


the tables available within that database.

example=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+----------
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
(4 rows)

From the table above, we can see that pgbench created the four expected tables. This
means our database is now populated and ready to be used to measure our database
instance’s performance.

Establishing a baseline
When doing any sort of performance tuning, it is best to first establish a baseline
performance. This baseline will serve as a measurement as to whether or not the changes
you have performed have increased or decreased performance.

Let’s go ahead and call pgbench to establish the baseline for our “out of the box”
PostgreSQL instance.

$ pgbench -c 10 -j 2 -t 10000 example


starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 4.176 ms
tps = 2394.718707 (including connections establishing)
tps = 2394.874350 (excluding connections establishing)

When calling pgbench, we add quite a few options to the command. The first is -
c (clients), which is used to define the number of clients to connect with. For this testing,
I used 10 to tell pgbench to execute with 10 clients.

What this means is that when pgbench is executing tests, it opens 10 different sessions.

The next option is the -j (threads) flag. This flag is used to define the number of worker
processes for pgbench. In the above command, I specified the value of 2. This will
tell pgbench to start two worker processes during the benchmarking.

The third option used is -t (transactions), which is used to specify the number of
transactions to execute. In the command above, I provided the value of 10,000.
However this doesn’t mean that only 10,000 transactions will be executed against our
database service. What it means is that each client session will
execute 10,000 transactions.

To summarize, the baseline test run was two pgbench worker processes


simulating 10,000 transactions from 10 clients for a total of 100,000 transactions.

With that understanding, let’s take a look at the results of this first test.

$ pgbench -c 10 -j 2 -t 10000 example


starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 4.176 ms
tps = 2394.718707 (including connections establishing)
tps = 2394.874350 (excluding connections establishing)

The output of pgbench has quite a bit of information. Most of it describes the test
scenarios being executed. The part that we are most interested in is the following:

tps = 2394.718707 (including connections establishing)


tps = 2394.874350 (excluding connections establishing)

From these results, it seems our baseline is 2,394 database transactions per second. Let’s
go ahead and see if we can increase this number by modifying a simple configuration
parameter within PostgreSQL.

Adding More Cache


One of the go-to parameters for anyone tuning PostgreSQL is
the shared_buffers parameter. This parameter is used to specify the amount of
memory the PostgreSQL service can utilize for caching. This caching mechanism is used
to store the contents of tables and indexes in memory.

To show how we can use pgbench for performance tuning, we will be adjusting this


value to test performance gains/losses.

By default, the shared_buffers value is set to 128MB, a fairly low value considering the


amount of available memory on most servers today. We can see this setting for ourselves
by looking at the contents of the /etc/postgresql/9.5/main/postgresql.conf file.
Within this file, we should see the following.

# - Memory -

shared_buffers = 128MB # min 128kB


# (change requires restart)

Let’s go ahead and switch this value to 256MB, effectively doubling our available cache
space.

# - Memory -

shared_buffers = 256MB # min 128kB


# (change requires restart)

Once completed, we will need to restart the PostgreSQL service. We can do so by


executing the systemctl command with the restart option.

# systemctl restart postgresql

Once the service is fully up and running, we can once again use pgbench to measure our
performance.

$ pgbench -c 10 -j 2 -t 10000 example


starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.921 ms
tps = 2550.313477 (including connections establishing)
tps = 2550.480149 (excluding connections establishing)

In our earlier baseline test, we were able to hit a rate of 2,394 transactions per second. In
this last run, after updating the shared_buffers parameter, we were able to
achieve 2,550 transactions per second, an increase of 156. While this is not a bad start,
we can still go further.

While the shared_buffers parameter might start off at 128MB, the recommended value


for this parameter is one-fourth the system memory. Our test system has 2GB of system
memory, a value we can verify with the free command.
$ free -m
total used free shared buff/cache
available
Mem: 2000 54 109 548 1836
1223
Swap: 0 0 0

In the output above, we can see that the total column shows a value of 2000MB on the
row for memory. This column shows the total physical memory available to the system.
We can also see in the available column that 1223MB is showing available. This means
we have up to 1.2 GB of free memory we can use for our tuning purposes.

If we change our shared_buffers parameter to the recommended value of one-fourth


system memory, we would need to change it to 512MB. Let’s go ahead and make this
change and rerun our pgbench test.

# - Memory -

shared_buffers = 512MB # min 128kB


# (change requires restart)

With the shared_buffers value updated in


the /etc/postgresql/9.5/main/postgresql.conf, we can go ahead and restart the
PostgreSQL service.

# systemctl restart postgresql

After restarting, let’s rerun our test.

$ pgbench -c 10 -j 2 -t 10000 example


starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.756 ms
tps = 2662.750932 (including connections establishing)
tps = 2663.066421 (excluding connections establishing)

This time, our system was able to reach 2,662 transactions per second, an additional
increase of 112 transactions per second. Since our transactions per second increased by
at least 100 both times, let’s go a step further and see what happens when changing this
value to 1GB.
# - Memory -

shared_buffers = 1024MB # min 128kB


# (change requires restart)

After updating the value, we will need to once again restart the PostgreSQL service.

# systemctl restart postgresql

With the service restarted, we can now rerun our test.

$ pgbench -c 10 -j 2 -t 10000 example


starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 2
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
latency average: 3.744 ms
tps = 2670.791865 (including connections establishing)
tps = 2671.079076 (excluding connections establishing)

This time, our transactions per second went from 2,662 to 2,671 and increase


of 9 transactions per second. This is a situation where we are hitting diminishing returns.

While it is feasible for many environments to increase the shared_buffers value


beyond the one-fourth guideline, doing so does not return the same results for this test
database.

Summary
Based on the results of our testing, we can see that changing the value of
the shared_buffers from 128MB to 512MB on our test system resulted in
a 268 transactions per second increase in performance. Based on our baseline results,
that is a 10 percent increase in performance.

We did this all on a base PostgreSQL instance using pgbench‘s sample database.


Meaning, we did not have to load our application to get a baseline metric on how well
PostgreSQL performs.

While we were able to increase our throughput by modifying


the shared_buffers parameter within PostgreSQL, there are many more tuning
parameters available. For anyone looking to tune a PostgreSQL instance, I would highly
recommend checking out PostgreSQL’s wiki.

You might also like