PostgreSQL Experiments

Last modified 13 May 2021 11:38 +02:00

Large object storage (TOAST)

When storing BLOBs to m_user.fullobject these are stored into separate TOAST table. Following function was used to insert random_bytea(100, 20000) (byte array between 100B and 20kB):

-- based on https://dba.stackexchange.com/a/22571
CREATE OR REPLACE FUNCTION random_bytea(min_len integer, max_len integer)
    RETURNS bytea
    LANGUAGE sql
AS $$
    SELECT decode(string_agg(lpad(to_hex(width_bucket(random(), 0, 1, 256) - 1), 2, '0'), ''), 'hex')
    -- width_bucket starts with 1, we counter it with series from 2; +1 is there to includes upper bound too
    -- should be marginally more efficient than: generate_series(1, $1 + trunc(random() * ($2 - $1 + 1))::integer)
    FROM generate_series(2, $1 + width_bucket(random(), 0, 1, $2 - $1 + 1));
$$;

Insert the rows:

DO $$ BEGIN
    FOR r IN 1..10000 LOOP
        INSERT INTO m_user (name_norm, name_orig, fullobject, version)
        VALUES ('user-' || LPAD(r::text, 10, '0'), 'user-' || LPAD(r::text, 10, '0'),
            random_bytea(100, 20000), 1);
    END LOOP;
END; $$;

With 10k rows the sizes of tables were:

-- top 20 biggest tables or their TOAST (large object storage) from public schema
SELECT
    t.oid,
    CASE
        WHEN tft.relname IS NOT NULL
            THEN tft.relname || ' (TOAST)'
        ELSE t.relname
    END AS object,
    pg_size_pretty(pg_relation_size(t.oid)) AS size
FROM pg_class t
    INNER JOIN pg_namespace ns ON ns.oid = t.relnamespace
    -- table for toast
    LEFT JOIN pg_class tft ON tft.reltoastrelid = t.oid
    LEFT JOIN pg_namespace tftns ON tftns.oid = tft.relnamespace
WHERE 'public' IN (ns.nspname, tftns.nspname)
ORDER BY pg_relation_size(t.oid) DESC
LIMIT 20;

The results are (rows after the second one are small and the rest is omitted):

oid object size

16503

m_user (TOAST)

102 MB

16498

m_user

2008 kB

16452

m_object_oid

440 kB

Size of the toast for m_user is about right (10k rows times 10kB). With random bytes the TOAST compression could not do much. To demonstrate the compression, let’s generate byte arrays full of zeroes:

CREATE OR REPLACE FUNCTION zero_bytea(min_len integer, max_len integer)
    RETURNS bytea
    LANGUAGE sql
AS $$
    SELECT decode(string_agg('00', ''), 'hex')
    FROM generate_series(2, $1 + width_bucket(random(), 0, 1, $2 - $1 + 1));
$$;

DELETE FROM m_object;

DO $$ BEGIN
    FOR r IN 1..1000 LOOP
            INSERT INTO m_user (name_norm, name_orig, fullobject, version)
            VALUES ('user-' || LPAD(r::text, 10, '0'), 'user-' || LPAD(r::text, 10, '0'),
--                     random_bytea(100, 20000), 1);
                    zero_bytea(100, 20000), 1);
        END LOOP;
END; $$;

Now, let’s see the sizes using the query from above - the results are:

oid object size

16498

m_user

880 kB

16516

m_user_name_orig_idx

480 kB

16517

m_user_name_norm_key

480 kB

16456

m_object_oid_pk

448 kB

16506

m_user_pkey

448 kB

16452

m_object_oid

128 kB

The rest is 8kB or less - and the toast table for user is not even mentioned here. It actually shows 0 bytes of size at this moment.

Now switch the commented/uncommented lines in the loop above to use random_bytea (and loop range from 1001 to 2000 to keep usernames unique) - and let’s see what happens:

oid object size

16503

m_user (TOAST)

10 MB

16498

m_user

880 kB

16517

m_user_name_norm_key

480 kB

16516

m_user_name_orig_idx

480 kB

16506

m_user_pkey

448 kB

16456

m_object_oid_pk

448 kB

16452

m_object_oid

128 kB

Toast table for m_user is immediately leader of our chart. This is good - this keeps m_user small and also shows that the data is compressed by default.

Creating many sub-tables dynamically

When measuring the impact of the number of inherited tables the following block was used to generate arbitrary number of tables inherited from both m_object and m_focus:

-- creating more tables inherited from m_object or m_focus
DO
$$
    BEGIN
        FOR r IN 1..25
            LOOP
                EXECUTE 'CREATE TABLE m_omore' || r || '(
                    objectTypeClass INT4 GENERATED ALWAYS AS (101) STORED,
                    PRIMARY KEY (oid)
                )
                    INHERITS (m_object)';
                EXECUTE 'CREATE TABLE m_fmore' || r || '(
                    objectTypeClass INT4 GENERATED ALWAYS AS (101) STORED,
                    PRIMARY KEY (oid)
                )
                    INHERITS (m_focus)';
            END LOOP;
    END;
$$;

-- and to drop them
DO
$$
    BEGIN
        FOR r IN 1..25
            LOOP
                EXECUTE 'DROP TABLE m_fmore' || r ;
                EXECUTE 'DROP TABLE m_omore' || r ;
            END LOOP;
    END
$$;