How to Get Metadata from PostgreSQL System Catalogs | Ezz

PostgreSQL is a well-known database engine that allows you to write performant SQL queries.

In this tutorial, you’ll learn about the internal building blocks of PostgreSQL – also known as system catalogs – to see how the PostgreSQL backend is structured.

This tutorial aims to give you insights into how PostgreSQL system catalogs can help you manage your schemas, databases, tables, and even table columns.

What is a PostgreSQL system catalog?

When you set up PostgreSQL in your machine, you spin up a PostgreSQL cluster , a server that hosts databases.

This cluster has a set of catalogs. Each catalog is a collection of schemas, and each schema has a set of tables.

System catalogs are where a database management system stores schema metadata, such as information about databases, tables, and columns.

These catalogs are regular tables where you can manipulate data as you do with other tables. You can insert new rows, update values, add columns – or potentially mess up your system. So you shouldn’t change the system catalogs unless you really know what you’re doing.

A few system catalogs are shared across all databases in the PostgreSQL cluster. However, most system catalogs are database-specific.

Each system catalog name starts with “pg_”. PostgreSQL has some funny naming conventions for the column names of system catalogs. A high percentage of columns – if not all – have a prefix of a combination of three letters of the name after “pg_”. For example, the pg_database has columns starting with “dat”, like datname, which is the database name column.

A couple more examples are:

pg_proc , which has columns that start with “pro”.
pg_namespace , which has columns that begin with “nsp”.

Another interesting example is pg_class , which has columns that start with “rel”. This is because the pg_class catalog stores all information about tables and other objects that have columns. These kinds of objects are referred to as “relations”.

A typical example is a pg_constraint catalog, which stores primary, unique, and foreign keys and table constraints. Each column starts with “con” except for the oid column, which is the row identifier.

Getting started: Prerequisities

Following this tutorial, you’ll need to write SQL commands in your desired SQL client. You can also use CoderPad’s sandbox, which uses PostgreSQL 12.

One pro tip on CoderPad sandbox is to head over to the right terminal and run this to prettify the output of the SQL command:

coderpad=# \x

This is useful, especially if the record length is long. For example, this is before running \x :

And now, to prettify that mess, run \x on the terminal for expanded formatting for each record:

Why getting metadata is useful

Getting PostgreSQL metadata can help you have information about databases, tables, schema, indexes, columns, functions, and even size information.

To name a few use cases, you can do the following:

Know who the database owner is
Extract the number of transactions committed to the database
List all tables in the PostgreSQL cluster
Investigate the number of full scans, index scans, etc.
Retrieve all schemas in your cluster
Be informed of the tables that have no indexes so you can fix performance issues
Investigate the data types of columns that have problems in your tables
List functions based on their inputs
Know the size of tables, databases, indexes, etc.

How to get database metadata

In this section, I’ll talk about how to get database metadata, get the name of the database owner, and extract statistical information about the database, like the number of transactions.

Using pg_database

The pg_database catalog stores information about Postgres databases available in your cluster. Whenever you run the SQL command CREATE DATABASE , a new row is inserted into the pg_database catalog. This row has metadata related to that particular database.

This pg_database catalog is system-wide, which means it’s shared across all databases in the cluster.

A query on pg_database would look something like this:

SELECT datdba
FROM pg_database
WHERE datname = 'coderpad';

This query would get you information about the coderpad database, like the database owner. In this case, it returns the string 10 , which references the OID in the pg_authid catalog.

To get the coderpad owner who initially created the database, you’d use this query:

SELECT a.rolname AS "Owner"
FROM pg_database d
JOIN pg_authid a ON a.oid = d.datdba
WHERE datname = 'coderpad';

Using pg_stat_database

The pg_stat_database catalog stores statistical information about the database. This catalog is database-specific.

Here’s an example query using pg_stat_database :

SELECT xact_commit, xact_rollback
FROM pg_stat_database
WHERE datname = 'coderpad';

This query helps us analyze database transaction metadata. The xact_commit attribute shows how many transactions are committed in this database, while the xact_rollback attribute indicates how many transactions have been rolled back (errored) in this database.

How to retrieve table metadata

In this section, I’ll talk about how to get table metadata, list all tables in the cluster, and retrieve statistical information and activity about tables, like the number of different scans.

Using pg_class

As mentioned earlier, the pg_class catalog stores relations, such as tables, indexes, and views.

To list all ordinary tables in the cluster, run the following query:

SELECT *
FROM pg_class
WHERE relkind = 'r';

The r string value in the relkind attribute points to the ordinary table.

Using pg_stat_user_tables

The pg_stat_user_tables catalog stores stats and activity about tables. For example, take this query:

SELECT seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE relname = 'projects';

It gets the number of sequential scans and the number of index scans initiated on the projects table. Sequential scans are full scans where each row in the table is scanned. In most cases, sequential scans are slow, especially when you retrieve a minimal number from a big table (aka, the selectivity ratio is low). In contrast, index scans do not scan the whole tables and instead scan the tree data structure created when there is an index to that table.

This catalog is a database-specific query, so the project’s relation resides on the coderpad database.

How to get schema metadata

In this section, I’ll talk about how to get schema metadata and get all schema names in your PostgreSQL cluster.

Using pg_namespace

The pg_namespace catalog stores namespaces. Per the PostgreSQL documentation , a namespace is the structure underlying SQL schemas. That means whenever you run the CREATE SCHEMA command, a new record is created under pg_namespace . Each namespace can have a separate collection of relations, types, etc., without name conflicts.

To list all schema names, run the following query:

SELECT nspname
FROM pg_namespace;

How to get indexes metadata

In this section, I’ll talk about how to get indexes metadata and tables that don’t use indexes across the PostgreSQL cluster or on a specific database.

Using pg_index

Part of the information about indexes is stored in the pg_index catalog. The rest is mostly in pg_class .

To get tables across the system that have no indexes, I have used the following query (which appeared in a Postgres conference workshop ):

SELECT c.oid::regclass c
FROM pg_class c
WHERE relkind = 'r'
AND NOT EXISTS (SELECT 1 FROM pg_index i WHERE i.indrelid = c.oid);

Note : the reg* data type is used to format names. In the previous query, I used regclass to format the relation’s name according to the class.

If you want to list all tables that don’t have indexes except the ones that exist on information schema (This query is based on one introduced in this Rober Haas workshop ):

SELECT c.oid::regclass c
FROM pg_class c
WHERE relkind = 'r'
AND NOT EXISTS (SELECT 1 FROM pg_index i WHERE i.indrelid = c.oid)
AND c.relnamespace NOT IN (13127);

Using pg_stat_user_tables.idx_scan

The database-specific query equivalent to the previous system-wide one is:

SELECT relname
FROM pg_stat_user_tables
WHERE idx_scan IS NULL;

How to get columns metadata

In this section, I’ll talk about how to get column metadata and know their data types across the PostgreSQL cluster.

Using pg_attribute

The pg_attribute catalog stores information about table columns. Every column in any table has a pg_attribute row in the database.

To list the first 50 attribute names with their corresponding data type, run the following query:

SELECT attname, atttypid::regtype
FROM pg_attribute
LIMIT 50;

Note: The regtype data type here is used to format the attribute type id into a data type like name , boolean , smallint , and more.

How to get functions metadata

In this section, I’ll talk about how to get function metadata and get specific kinds of functions based on the input across the cluster.

Using pg_proc

To get all PostgreSQL functions that take text, run the following query (introduced in a PGCon workshop ):

SELECT oid::regprocedure
FROM pg_proc
WHERE 'text'::regtype = ANY(proargtypes);

Note: The regprocedure data type gets you each function’s name and argument type, such as starts_with(text, text) .

How to get size information

In this section, I’ll discuss how to get size information about tables, databases, indexes, etc.

Table size using pg_table_size()

pg_table_size is a database object management function that determines the disk space used by the specified table.

Take a look at the following example (mentioned in the PGCon workshop ):

SELECT name, pg_size_pretty(size)
FROM (SELECT oid::regclass AS name, pg_table_size(oid) AS size
      FROM pg_class WHERE relkind = 'r') x
ORDER BY size DESC;

This query orders the cluster relations sizes in descending order. The pg_size_pretty() function makes the size numbers into a human-readable format like in KB, MB, etc., as the default output size is in bytes.

Database size using pg_database_size()

Similarly, the pg_database_size is a database function that computes disk space used by the specified database. So the following query (mentioned in the Robert Haas workshop ) orders each database in descending order according to the corresponding size:

SELECT datname, pg_size_pretty(size)
FROM (SELECT datname, pg_database_size(oid) AS size
      FROM pg_database) x
ORDER BY size DESC;

Indexes size using pg_indexes_size()

The pg_indexes_size() is another function that computes disk space used by the specified index.

The following query gets you records of index names ordered by size:

SELECT i.indexrelname, x.indexrelid, pg_size_pretty(size)
FROM (SELECT indexrelid, pg_indexes_size(indrelid) AS size
      FROM pg_index) x
JOIN pg_stat_user_indexes i ON i.indexrelid = x.indexrelid
ORDER BY size DESC;

Why knowing about system catalogs is useful

Getting information about your PostgreSQL database is not that hard if you know how to use system catalogs.

In addition to what we talked about in this article, here are some other things you can do with your new knowledge of system catalogs:

You can see how PostgreSQL is configured, so you check that pg_ table that contains almost all you need.
Have a slow query? Tracking how many sequential scans or indexes are used might solve this performance problem.
If you’re worried about the disk space of your database, you can now more easily check how big your databases are getting to see which ones may cause issues.

The system catalog will know about every SQL query you write. Practicing more with this wonderful feature will help you and your team to investigate performance, disk space, and memory usage in your cluster.

This post was originally published on CoderPad, check it out.