PostgreSQL is a well-known database engine that allows you to write performant SQL queries.
In this tutorial, you’ll learn about the internal building blocks of PostgreSQL – also known as system catalogs – to see how the PostgreSQL backend is structured.
This tutorial aims to give you insights into how PostgreSQL system catalogs can help you manage your schemas, databases, tables, and even table columns.
What is a PostgreSQL system catalog?
When you set up PostgreSQL in your machine, you spin up a PostgreSQL cluster , a server that hosts databases.
This cluster has a set of catalogs. Each catalog is a collection of schemas, and each schema has a set of tables.
System catalogs are where a database management system stores schema metadata, such as information about databases, tables, and columns.
These catalogs are regular tables where you can manipulate data as you do with other tables. You can insert new rows, update values, add columns – or potentially mess up your system. So you shouldn’t change the system catalogs unless you really know what you’re doing.
A few system catalogs are shared across all databases in the PostgreSQL cluster. However, most system catalogs are database-specific.
Each system catalog name starts with “pg_”. PostgreSQL has some funny naming conventions for the column names of system catalogs. A high percentage of columns – if not all – have a prefix of a combination of three letters of the name after “pg_”. For example, the
pg_database has columns starting with “dat”, like *datname,* which is the database name column.
A couple more examples are:
pg_proc, which has columns that start with “pro”.
pg_namespace, which has columns that begin with “nsp”.
Another interesting example is
pg_class , which has columns that start with “rel”. This is because the
pg_class catalog stores all information about tables and other objects that have columns. These kinds of objects are referred to as “relations”.
A typical example is a
pg_constraint catalog, which stores primary, unique, and foreign keys and table constraints. Each column starts with “con” except for the
oid column, which is the row identifier.
Getting started: Prerequisities
Following this tutorial, you’ll need to write SQL commands in your desired SQL client. You can also use CoderPad’s sandbox, which uses PostgreSQL 12.
One pro tip on CoderPad sandbox is to head over to the right terminal and run this to prettify the output of the SQL command:
This is useful, especially if the record length is long. For example, this is before running
And now, to prettify that mess, run
\x on the terminal for expanded formatting for each record:
Why getting metadata is useful
Getting PostgreSQL metadata can help you have information about databases, tables, schema, indexes, columns, functions, and even size information.
To name a few use cases, you can do the following:
- Know who the database owner is
- Extract the number of transactions committed to the database
- List all tables in the PostgreSQL cluster
- Investigate the number of full scans, index scans, etc.
- Retrieve all schemas in your cluster
- Be informed of the tables that have no indexes so you can fix performance issues
- Investigate the data types of columns that have problems in your tables
- List functions based on their inputs
- Know the size of tables, databases, indexes, etc.
How to get database metadata
In this section, I’ll talk about how to get database metadata, get the name of the database owner, and extract statistical information about the database, like the number of transactions.
pg_database catalog stores information about Postgres
databases available in your cluster. Whenever you run the SQL command
CREATE DATABASE , a new row is inserted into the
catalog. This row has metadata related to that particular database.
pg_database catalog is system-wide, which means it’s shared
across all databases in the cluster.
A query on
pg_database would look something like this:
SELECT datdba FROM pg_database WHERE datname = 'coderpad';
This query would get you information about the coderpad database, like
the database owner. In this case, it returns the string
10 , which
OID in the
To get the coderpad owner who initially created the database, you’d use this query:
SELECT a.rolname AS "Owner" FROM pg_database d JOIN pg_authid a ON a.oid = d.datdba WHERE datname = 'coderpad';
pg_stat_database catalog stores statistical information about
the database. This catalog is database-specific.
Here’s an example query using
SELECT xact_commit, xact_rollback FROM pg_stat_database WHERE datname = 'coderpad';
This query helps us analyze database transaction metadata. The
xact_commit attribute shows how many transactions are committed in
this database, while the
xact_rollback attribute indicates how
many transactions have been rolled back (errored) in this database.
How to retrieve table metadata
In this section, I’ll talk about how to get table metadata, list all tables in the cluster, and retrieve statistical information and activity about tables, like the number of different scans.
As mentioned earlier, the
pg_class catalog stores relations, such
as tables, indexes, and views.
To list all ordinary tables in the cluster, run the following query:
SELECT * FROM pg_class WHERE relkind = 'r';
r string value in the
relkind attribute points to the
pg_stat_user_tables catalog stores stats and activity about
tables. For example, take this query:
SELECT seq_scan, idx_scan FROM pg_stat_user_tables WHERE relname = 'projects';
It gets the number of sequential scans and the number of index scans initiated on the projects table. Sequential scans are full scans where each row in the table is scanned. In most cases, sequential scans are slow, especially when you retrieve a minimal number from a big table (aka, the selectivity ratio is low). In contrast, index scans do not scan the whole tables and instead scan the tree data structure created when there is an index to that table.
This catalog is a database-specific query, so the project’s relation resides on the coderpad database.
How to get schema metadata
In this section, I’ll talk about how to get schema metadata and get all schema names in your PostgreSQL cluster.
pg_namespace catalog stores namespaces. Per the PostgreSQL
, a namespace is the structure underlying SQL schemas. That means
whenever you run the
CREATE SCHEMA command, a new record is
pg_namespace . Each namespace can have a separate
collection of relations, types, etc., without name conflicts.
To list all schema names, run the following query:
SELECT nspname FROM pg_namespace;
How to get indexes metadata
In this section, I’ll talk about how to get indexes metadata and tables that don’t use indexes across the PostgreSQL cluster or on a specific database.
Part of the information about indexes is stored in the
catalog. The rest is mostly in
To get tables across the system that have no indexes, I have used the following query (which appeared in a Postgres conference workshop ):
SELECT c.oid::regclass c FROM pg_class c WHERE relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_index i WHERE i.indrelid = c.oid);
Note : the
reg* data type is used to format names. In the
previous query, I used
regclass to format the relation’s name
according to the class.
If you want to list all tables that don’t have indexes except the ones that exist on information schema (This query is based on one introduced in this Rober Haas workshop ):
SELECT c.oid::regclass c FROM pg_class c WHERE relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_index i WHERE i.indrelid = c.oid) AND c.relnamespace NOT IN (13127);
The database-specific query equivalent to the previous system-wide one is:
SELECT relname FROM pg_stat_user_tables WHERE idx_scan IS NULL;
How to get columns metadata
In this section, I’ll talk about how to get column metadata and know their data types across the PostgreSQL cluster.
pg_attribute catalog stores information about table columns.
Every column in any table has a
pg_attribute row in the database.
To list the first 50 attribute names with their corresponding data type, run the following query:
SELECT attname, atttypid::regtype FROM pg_attribute LIMIT 50;
regtype data type here is used to format the
attribute type id into a data type like
smallint , and more.
How to get functions metadata
In this section, I’ll talk about how to get function metadata and get specific kinds of functions based on the input across the cluster.
To get all PostgreSQL functions that take text, run the following query (introduced in a PGCon workshop ):
SELECT oid::regprocedure FROM pg_proc WHERE 'text'::regtype = ANY(proargtypes);
regprocedure data type gets you each function’s name
and argument type, such as
starts_with(text, text) .
How to get size information
In this section, I’ll discuss how to get size information about tables, databases, indexes, etc.
Table size using pg_table_size()
pg_table_size is a database object management function that
determines the disk space used by the specified table.
Take a look at the following example (mentioned in the PGCon workshop ):
SELECT name, pg_size_pretty(size) FROM (SELECT oid::regclass AS name, pg_table_size(oid) AS size FROM pg_class WHERE relkind = 'r') x ORDER BY size DESC;
This query orders the cluster relations sizes in descending order. The
pg_size_pretty() function makes the size numbers into a
human-readable format like in KB, MB, etc., as the default output size
is in bytes.
Database size using pg_database_size()
pg_database_size is a database function that
computes disk space used by the specified database. So the following
query (mentioned in the Robert Haas
) orders each database in descending order according to the
SELECT datname, pg_size_pretty(size) FROM (SELECT datname, pg_database_size(oid) AS size FROM pg_database) x ORDER BY size DESC;
Indexes size using pg_indexes_size()
pg_indexes_size() is another function that computes disk space
used by the specified index.
The following query gets you records of index names ordered by size:
SELECT i.indexrelname, x.indexrelid, pg_size_pretty(size) FROM (SELECT indexrelid, pg_indexes_size(indrelid) AS size FROM pg_index) x JOIN pg_stat_user_indexes i ON i.indexrelid = x.indexrelid ORDER BY size DESC;
Why knowing about system catalogs is useful
Getting information about your PostgreSQL database is not that hard if you know how to use system catalogs.
In addition to what we talked about in this article, here are some other things you can do with your new knowledge of system catalogs:
- You can see how PostgreSQL is configured, so you check that
pg_ tablethat contains almost all you need.
- Have a slow query? Tracking how many sequential scans or indexes are used might solve this performance problem.
- If you’re worried about the disk space of your database, you can now more easily check how big your databases are getting to see which ones may cause issues.
The system catalog will know about every SQL query you write. Practicing more with this wonderful feature will help you and your team to investigate performance, disk space, and memory usage in your cluster.
This post was originally published on CoderPad, check it out.