create table tjson (
oid UUID NOT NULL DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
ext JSONB,
CONSTRAINT tjson_oid_pk PRIMARY KEY (oid)
);
ALTER TABLE tjson ADD CONSTRAINT tjson_name_key UNIQUE (name);
CREATE INDEX tjson_ext_idx ON tjson USING gin (ext);
Comparing JSONB and EAV model for extensions
Performance was first measured on default PG installation on 2GB RAM VirtualBox with 1 vCPU. The results can be better (or worse depending on the host) even by an order-of-magnitude where memory/CPU (mostly counts). Comparison with the same box resized to 8GB/4CPU is provided. All tests were run on the host with Ryzen 5 1600AF and 32GB RAM. |
Structure for JSON is simple, single table tjson
with (oid UUID PK, name VARCHAR, ext JSONB)
.
Column name
has a unique index and ext
has a GIN
index only by default.
Other indexes are added as needed and mentioned during tests.
Structure for EAV model has a master table teav
with (oid UUID PK, name VARCHAR)
with unique index on the name
.
Detail table teav_ext_string
contains (owner_oid FK, key VARCHAR, value VARCHAR)
.
Extension table has PK(owner_oid, key, value)
, that is all columns, value could be omitted if uniqueness is not necessary.
Because the PK covers all columns, it should be index-organized table in Oracle (PG doesn’t have this feature).
In PG it can be clustered by an index,
but this is potentially costly (12min for 15M rows) during which the table is unusable
because of the used ACCESS EXCLUSIVE
lock.
create table teav (
oid UUID NOT NULL DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
CONSTRAINT teav_oid_pk PRIMARY KEY (oid)
);
ALTER TABLE teav ADD CONSTRAINT teav_name_key UNIQUE (name);
create table teav_ext_string (
owner_oid UUID NOT NULL references teav(oid),
key VARCHAR(32) NOT NULL,
value VARCHAR(255) NOT NULL,
-- this also covers the index on owner_oid FK
CONSTRAINT teav_ext_string_pk PRIMARY KEY (owner_oid, key, value)
);
CREATE INDEX teav_ext_string_key_value_idx ON teav_ext_string (key, value);
Utility pgbench
can be used for measurements like so:
pgbench -r -P 5 -f - -t 30 << "--EOF" select * from teav limit 500; ... more selects --EOF
If pgbench
is prefixed with --
it can be part of SQL file as a comment, hence --EOF
as a delimiter.
It is easy to copy/paste from such an SQL file directly to shell to run the pgbench
.
Delimiter is quoted with "
to avoid any accidental shell expansions within the block.
Results can be taken from the final report of average latency per statement (enabled by -r
option).
Results may vary widely between runs, especially for shorter queries.
Also, running the whole script and running smaller bulks provides different results as well,
typically better for smaller amount of statement (this may be caused by rather limited VM performance).
Measurements for shorter queries were repeated with longer queries commented out and option -t 30
to get more repetitions per statement (30 in this case) which typically leads to lower, but also more representative, averages.
Full SQL file is attached here.
Insert
Insert loop is created in such a way that both version have exactly the same data, even where randomized.
-- start with smaller batches 1..1000, 1001..100000, etc.
DO
$$
DECLARE
hobbies VARCHAR[];
v VARCHAR;
id UUID;
BEGIN
FOR r IN 11000001..15000000 LOOP
IF r % 10 <= 1 THEN
-- some entries have no extension
INSERT INTO tjson (name) VALUES ('user-' || LPAD(r::text, 10, '0'));
INSERT INTO teav (name) VALUES ('user-' || LPAD(r::text, 10, '0'));
ELSEIF r % 10 <= 3 THEN
-- email+eid (constant keys) + other-key-{r} (variable key)
INSERT INTO tjson (name, ext) VALUES (
'user-' || LPAD(r::text, 10, '0'),
('{"eid": ' || r || ', "email": "user' || r || '@mycompany.com", "other-key-' || r || '": "other-value-' || r || '"}')::jsonb
);
INSERT INTO teav (name) VALUES ('user-' || LPAD(r::text, 10, '0')) RETURNING oid INTO id;
INSERT INTO teav_ext_string (owner_oid, key, value) VALUES (id, 'email', 'user' || r || '@mycompany.com');
INSERT INTO teav_ext_string (owner_oid, key, value) VALUES (id, 'eid', r);
INSERT INTO teav_ext_string (owner_oid, key, value) VALUES (id, 'other-key-' || r, 'other-value-' || r);
ELSEIF r % 10 <= 4 THEN
-- rarely used values of hobbies key
hobbies := random_pick(ARRAY['recording', 'guitar', 'beer', 'rum', 'writing', 'coding', 'debugging', 'gaming', 'shopping', 'watching videos', 'sleeping', 'dreaming'], 0.1);
-- JSONB
INSERT INTO tjson (name, ext) VALUES (
'user-' || LPAD(r::text, 10, '0'),
('{"eid": ' || r || ', "hobbies": '|| array_to_json(hobbies)::text || '}')::jsonb
);
-- EAV
INSERT INTO teav (name) VALUES ('user-' || LPAD(r::text, 10, '0')) RETURNING oid INTO id;
INSERT INTO teav_ext_string (owner_oid, key, value) VALUES (id, 'eid', r);
FOREACH v IN ARRAY hobbies LOOP
INSERT INTO teav_ext_string (owner_oid, key, value)
VALUES (id, 'hobbies', v);
END LOOP;
ELSE
-- these values are used by many entries
hobbies := random_pick(ARRAY['eating', 'books', 'music', 'dancing', 'walking', 'jokes', 'video', 'photo'], 0.4);
-- JSONB
INSERT INTO tjson (name, ext) VALUES (
'user-' || LPAD(r::text, 10, '0'),
('{"eid": ' || r || ', "hobbies": '|| array_to_json(hobbies)::text || '}')::jsonb
);
-- EAV
INSERT INTO teav (name) VALUES ('user-' || LPAD(r::text, 10, '0')) RETURNING oid INTO id;
INSERT INTO teav_ext_string (owner_oid, key, value) VALUES (id, 'eid', r);
FOREACH v IN ARRAY hobbies LOOP
INSERT INTO teav_ext_string (owner_oid, key, value)
VALUES (id, 'hobbies', v);
END LOOP;
END IF;
IF r % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END $$;
Results for 15 million rows
Both JSON and EAV tables contain 15M rows, EAV extension detail table contains 45.8M rows.
Object | Size | Note |
---|---|---|
teav_ext_string |
2841 MB |
attribute value table (detail for EAV) |
teav_ext_string_pk |
2373 MB |
PK of AV table, nearly the same size |
tjson |
1927 MB |
JSON table (smaller than just AV table alone) |
tjson_ext_idx |
1616 MB |
GIN index for |
teav_ext_string_key_value_idx |
880 MB |
index for searching by key+value |
teav |
862 MB |
master table for EAV model |
teav_oid_pk |
451 MB |
primary key index of that table |
tjson_oid_pk |
451 MB |
PK of JSON table, virtually the same as for EAV |
teav_name_key, tjson_name_key |
451 MB |
name indexes are the same |
Counts JSON:
Operation | Time avg (ms) | Note |
---|---|---|
select count(*) from tjson |
6512 |
seq-scan by default;
doesn’t use PK index but if forced to, the time goes to ~2s;
some additional index on |
… where ext @> '{"hobbies":["video"]}' |
8200 |
seq-scan, low selectivity, matches 3.4M of 15M rows |
… where ext @> '{"hobbies":["sleeping"]}' |
181 |
high selectivity, matches ~50k rows, uses GIn index, Gather / Partial Aggregate / Parallel Bitmap Heap Scan on tjson / Bitmap Index Scan on tjson_ext_idx |
select count(*) from tjson where ext→>'email' LIKE 'user2%' |
7174 |
matches ~222k rows, seq-scan;
index |
where UPPER(ext→>'email') LIKE 'USER2%' |
7855 |
like above with little overhead because of the function; even ignores the index on (UPPER(ext→>'email')) |
Counts EAV:
Operation | Time avg (ms) | Note |
---|---|---|
select count(*) from teav_ext_string |
6382 |
informational, otherwise useless, for 15M |
select count(*) from teav |
2284 |
seq-scan |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') |
17,465 |
seq-scan, low selectivity |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') |
1965 |
Gather / Partial Aggregate / Nested Loop ( Parallel Bitmap Heap Scan on teav_ext_string / Bitmap Index Scan on teav_ext_string_key_value_idx, Index Only Scan using teav_oid_pk) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%') |
14,932 |
very slow, seq-scan on ext table, |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and UPPER(es.value) LIKE 'USER2%') |
11,869 |
Aggregate / Nested Loop ( HashAggregate / Gather / seq-scan on the ext table, Index Only Scan using teav_oid_pk) |
select count(owner_oid) from teav_ext_string where key = 'email' and value LIKE 'user2%' |
~7s |
the same result for single-valued extension, hadly to be expected from query interpreter, still seq-scan |
All selects have LIMIT 500 for practical reasons (unless stated differently),
pgbench would try to read all the lines otherwise.
|
Selects - JSON:
Operation | Time avg (ms) | Note |
---|---|---|
select * from tjson |
0.949 |
doesn’t need index |
… where ext @> '{"hobbies":["video"]}' |
2.034 |
doesn’t need index |
… order by oid |
6.32 |
index scan |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
0.97 |
PK index scan + index cond + filter, this demonstrates the power of keyset pagination (this returns last "page", less than 500) |
select * from tjson where ext @> '{"hobbies":["sleeping"]}' |
44.94 |
rare compared to "video",
may use |
… order by oid |
92.34 |
index scan |
where ext→>'email' LIKE 'user2%' |
2.25 |
seq-scan; ignores the index on |
… order by oid |
90.7 |
index scan |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
5.34 |
bitmap index scan |
where UPPER(ext→>'email') LIKE 'USER2%' |
2.40 |
seq-scan; ignores the functional index on |
… order by oid |
95.0 |
index scan |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
5.00 |
index scan |
where cast(ext→'eid' as int) = 5000 |
8133 (50 with index) |
seq-scan, inefficient version of the query below, don’t use |
where ext @> '{"eid":5000}' |
0.32 |
Bitmap Index Scan on |
Selects - EAV:
Operation | Time avg (ms) | Note |
---|---|---|
select * from teav |
0.494 |
seq-scan |
select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') |
6.184 |
index scan |
… order by t.oid |
4.734 |
index scan |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
11.34 |
index scan |
select * from teav t where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') |
24.28 |
index scan |
… order by t.oid |
206 |
index scan |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%') |
109 |
index scan |
… order by t.oid |
108 |
index scan |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
78.1 |
index scan |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and UPPER(es.value) LIKE 'USER2%') |
117 |
index scan |
… order by t.oid |
116 |
index scan |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
90.6 |
index scan |
where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'eid' and es.value = '5000') |
0.500 |
index scan |
There was little to no difference between semi-join (EXISTS
) and LEFT JOIN
wherever tried.
Not all selects were tried with both, but EXISTS
returns the right number of rows so it was preferred.
Performance with more RAM/CPU
After resizing the VirtualBox, the following parameters were used (uninformed setup using this site):
# DB Version: 13 # OS Type: linux # DB Type: web # Total Memory (RAM): 8 GB # CPUs num: 4 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 2GB effective_cache_size = 6GB maintenance_work_mem = 512MB checkpoint_completion_target = 0.7 # 0.9 for DB Type: oltp wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 10485kB min_wal_size = 1GB # 2GB for oltp max_wal_size = 4GB # 8GB for oltp max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 max_parallel_maintenance_workers = 2
Most of these are already commented out by default in postresql.conf
, the rest was commented too,
not to collide with the added section above.
Benchmark was run a couple of times first to warm up the DB before results were used.
It seems that selects are longer if checkpoint_completion_target
is set to 0.9.
Last column are results with -c 10
option added to pgbench
, all previous tests were single-client.
10 clients should overload the configuration.
Operation | 2GB/1CPU (ms) | 8GB/4CPU (ms) (speedup) | with 10 clients (slowdown) |
---|---|---|---|
select count(*) from tjson |
6512 |
440 (14.8x) |
3324 (7.6x) |
… where ext @> '{"hobbies":["video"]}' |
8200 |
1388 (5.9x) |
8691 (6.3x) |
… where ext @> '{"hobbies":["sleeping"]}' |
181 |
63.4 (2.9x) |
161 (2.5x) |
select count(*) from tjson where ext→>'email' LIKE 'user2%' |
7174 |
758 (9.5x) |
5201 (6.9x) |
where UPPER(ext→>'email') LIKE 'USER2%' |
7855 |
1196 (6.6x) |
7080 (5.9x) |
select count(*) from teav_ext_string |
6382 |
2213 (2.9x) |
7935 (3.6x) |
select count(*) from teav |
2284 |
337 (6.8x) |
2997 (8.9x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') |
17465 |
3324 (5.3x) |
58565 (17.6x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') |
1965 |
87 (22.6x) |
2966 (34.1x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%') |
14932 |
2499 (6x) |
22974 (9.2x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and UPPER(es.value) LIKE 'USER2%') |
11869 |
2188 (5.4x) |
9253 (4.2x) |
select * from tjson limit 500 |
0.949 |
0.734 (1.3x) |
5.781 (7.9x) |
… where ext @> '{"hobbies":["video"]}' |
2.034 |
1.873 (1.1x) |
4.815 (2.6x) |
… order by oid |
6.32 |
2.425 (2.6x) |
6.144 (2.5x) |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
0.97 |
0.761 (1.3x) |
3.548 (4.7x) |
select * from tjson where ext @> '{"hobbies":["sleeping"]}' |
44.94 |
41.82 (1.1x) |
100.546 (2.4x) |
… order by oid |
92.34 |
58.79 (1.6x) |
155.131 (2.6x) |
where ext→>'email' LIKE 'user2%' |
2.25 |
1.93 (1.2x) |
5.068 (2.6x) |
… order by oid |
90.7 |
23.19 (3.9x) |
78.79 (3.4x) |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
5.34 |
0.59 (9.1x) |
3.441 (5.8x) |
where UPPER(ext→>'email') LIKE 'USER2%' |
2.4 |
2.19 (1.1x) |
5.152 (2.4x) |
… order by oid |
95 |
25.2 (3.8x) |
78.07 (3.1x) |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
5 |
0.65 (7.7x) |
3.23 (5x) |
where ext @> '{"eid":5000}' |
0.32 |
0.25 (1.3x) |
1.5 (6x) |
select * from teav limit 500 |
0.494 |
0.419 (1.2x) |
1.5 (3.6x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'video') |
6.184 |
2.384 (2.6x) |
6.725 (2.8x) |
… order by t.oid |
4.734 |
2.225 (2.1x) |
7.919 (3.6x) |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
11.34 |
4.815 (2.4x) |
267.21 (55.5x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'hobbies' and es.value = 'sleeping') |
24.28 |
2.736 (8.9x) |
11.992 (4.4x) |
… order by t.oid |
206 |
42.8 (4.8x) |
267.21 (6.2x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and es.value LIKE 'user2%') |
109 |
40.6 (2.7x) |
315.2 (7.8x) |
… order by t.oid |
108 |
35.7 (3x) |
106.5 (3x) |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
78.1 |
21.9 (3.6x) |
43.26 (2x) |
… where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'email' and UPPER(es.value) LIKE 'USER2%') |
117 |
39.3 (3x) |
114.4 (2.9x) |
… order by t.oid |
116 |
39 (3x) |
113.1 (2.9x) |
… and t.oid>'fffe0000-0000-0000-0000-000000000000' |
90.6 |
21.4 (4.2x) |
41.61 (1.9x) |
where exists (select from teav_ext_string es where es.owner_oid = t.oid and es.key = 'eid' and es.value = '5000') |
0.5 |
0.468 (1.1x) |
2.229 (4.8x) |
JSONB vs EAV comparison of similar operations
Operations are shown as JSON version selects, avg times are taken from 8GB/4CPU configuration. Significant victories are in bold.
Operation (on JSON version) | JSON avg (ms) | EAV avg (ms) |
---|---|---|
select count(*) from tjson |
440 |
337 |
… where ext @> '{"hobbies":["video"]}' |
1388 |
3324 |
… where ext @> '{"hobbies":["sleeping"]}' |
63.4 |
87 |
select count(*) from tjson where ext→>'email' LIKE 'user2%' |
758 |
2499 |
where UPPER(ext→>'email') LIKE 'USER2%' |
1196 |
2188 |
non-weighted counts total |
3845 |
10648 |
select * from tjson |
0.734 |
0.419 |
… where ext @> '{"hobbies":["video"]}' |
1.873 |
2.384 |
… order by oid |
2.425 |
2.225 |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
0.761 |
4.815 |
select * from tjson where ext @> '{"hobbies":["sleeping"]}' |
41.82 |
2.736 |
… order by oid |
58.79 |
42.8 |
where ext→>'email' LIKE 'user2%' |
1.93 |
40.6 |
… order by oid |
23.19 |
35.7 |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
0.59 |
21.9 |
where UPPER(ext→>'email') LIKE 'USER2%' |
2.19 |
39.3 |
… order by oid |
25.2 |
39 |
… and oid>'fffe0000-0000-0000-0000-000000000000' |
0.65 |
21.4 |
where ext @> '{"eid":5000}' |
0.25 |
0.468 |
non-weighted select total |
160 ms |
254 ms |
Update
The problem with JSONB type is that it is not efficient to update, see this SO answer. Even though there are functions to update JSONB in a fine-grained manner, the whole row is written anyway, which also solves some MVCC (Multiversion Concurrency Control) problems. This can make little updates of large rows prohibitive.
There are cases when extension property holds thousands of items (e.g. groups in LDAP) and these must be written as a whole document when JSONB is used. Such properties are often index-only, that is, they are not stored in fullObject serialized form. With EAV model only small individual rows would be manipulated (deleted or inserted) which may be more efficient.
Using JSONB still may provide reasonable performance and there are various solutions:
-
The final extension value may be calculated in midPoint, JSONB is created and updated as a whole. This means the old value of the property must be obtained first (this can be controlled by configuration flags
enableNoFetchExtensionValuesInsertion/Deletion
for the old repository) it must travel from the DB to the application but makes the update process cleaner and more reliable from midPoint perspective. -
Only updates are sent to the database. This does not prevent writing of the whole row internally but saves network bandwidth. There are currently no tests whether the updates (especially multiple updates like add/delete combinations) are more efficient in the database or in midPoint (including additional traffic). Also, this can lead to inconsistent data, e.g. double adds.
Also, main reason NOT to use JSONB is when the data is used in a relational way, e.g. for JOINs.
This is not the case - except maybe for extension references if they are traversed with @
.
Various WHERE
conditions can be supported by
GIN indexes, optionally
with gin_trgm_ops for better
(I)LIKE
support.
Other types
Various data types are possible in extension attributes and these must be represented in JSONB.
-
Keys will actually be surrogate IDs from extension item catalog, probably numbers in
"
. Shortcuts are possible if readability is important, but unwrapping the JSONB in selects is possbile if necessary (see the next section). -
Numbers can be representad as JSON numbers which support
NUMERIC
range internally. This would allow for numeric comparison not possible with some number types in old repository. -
Date/time types can be represented as strings formated using ISO 8601. This allows for comparison as well as the string representation order is the same as temporal (but not between various date/time types, of course).
-
Multivalues are stored consistently as arrays, even when one value is stored. No value/empty array can be omitted. It may be wise to store any value in an array, which makes the system simpler, although the values must be always unwrapped (e.g. for comparision and corresponding indexes). But this allows for change of
maxOccurs
in the extension schema from 1 to more orunbounded
without any need of reindexing.
In general mixing scalar and arrays for operations can be tricky.
Conditions ext @> '{"hobbies":["eating"]}'
and ext @> '{"hobbies":"eating"}'
(without array)
are two different conditions that do not work the same way.
Using arrays very time means there is no need to reindex when maxOccurs
in shema changes.
Threre are still other cases when recomputing of extension column (aka "reindexing") is relevant.
Example of JSON unwrapping to rows
Given a row:
select * from tjson
where oid = '0cbe39c7-c7af-4cf3-a334-098400284a0a';
That returns something like:
oid | name | ext |
---|---|---|
0cbe39c7-c7af-4cf3-a334-098400284a0a |
user-0000038799 |
{"eid": 38799, "hobbies": ["eating", "books", "dancing", "walking", "video"]} |
We can explode all the top level values including the array with this select:
select oid, key, coalesce(aval, sval) val from (
select oid, key,
CASE WHEN jsonb_typeof(value) = 'array' THEN value END avals,
CASE WHEN jsonb_typeof(value) <> 'array' THEN value END sval
from tjson, jsonb_each(tjson.ext) fields
) x left join jsonb_array_elements(avals) aval on true
where oid = '0cbe39c7-c7af-4cf3-a334-098400284a0a';
Which returns:
oid | key | val |
---|---|---|
0cbe39c7-c7af-4cf3-a334-098400284a0a |
eid |
38799 |
0cbe39c7-c7af-4cf3-a334-098400284a0a |
hobbies |
"eating" |
0cbe39c7-c7af-4cf3-a334-098400284a0a |
hobbies |
"books" |
0cbe39c7-c7af-4cf3-a334-098400284a0a |
hobbies |
"dancing" |
0cbe39c7-c7af-4cf3-a334-098400284a0a |
hobbies |
"walking" |
0cbe39c7-c7af-4cf3-a334-098400284a0a |
hobbies |
"video" |
Obviously, this is not performance tuned, but for a single/few object(s) it is no problem at all.
Conclusion
-
JSON with index takes less space than EAV+ext table with their indexes (4GB vs 7GB, not counting unique indexes on names). This may change when keys are normalized using "ext item catalog", but both JSON and EAV would benefit from it.
-
Performance wise they are trading blows, but JSON is ~2 times faster overall, depending on the mix of selects.
-
EAV’s biggest win was finding any 500 rows with hobbies=sleeping (15x faster).
-
JSON was over 20x faster for 4 selects, all related to email
LIKE
searches (no index support on either solution).
-
-
There are ways how to improve searching with function based indexes, especially for specific attributes in JSON (see the slow select with
eid
above). -
Extension table index on key+value was used only for rare multi-value values (sleeping) and for exact matches of single values (eid). There may be other cases where it is used that were not tested here. Also, changing the VARCHAR key to INTEGER (FK to ext-item catalog) could entice PG to use it.
-
Counts are always much slower than limited select.
-
Ordering slows things down
-
Avoid unlimited selects.
Which model to use? I’d start with JSON but added possibility to store extension attribute in a flexible way. This complicates the implementation but can offer the better performing options. But this can be done later, including transparent extension item storage migration, etc.