Schema-less Graphs Using SQL
As previously introduced in this article, the current SQL specification (SQL:2023) includes syntax for defining graphs based on table data and for pattern matching against those graphs.
In this context, I am aware of a few projects considering switching from graph databases to relational databases. Their motivation is largely driven by concerns about the scalability and manageability of native graph databases, suggesting that these users are already using graph databases in production.
It remains to be seen whether the property graph features of relational databases (which I will refer to as SQL Graph) can resolve the above concerns, but we will likely see case studies on this matter moving forward.
Are Graph Databases Schema-less?
One of the questions I often receive is, “Does SQL Graph lack the schema-less nature of graph databases?” Some see this as a reason not to consider SQL Graph as a true graph database, while others are more practical and verify whether there are benefits in migration.
The following expectations are associated with the schema-less nature:
- No schema changes when introducing new types of nodes or edges.
- No schema changes when adding new properties to nodes or edges.
These features are generally not expected in SQL Graph, where schema changes, as outlined below, are necessary.
Schema Changes in SQL Graph
Since SQL Graph is a view based on tables, adding new types of nodes/edges or new properties requires schema changes. Let’s first take a look at how this works.
We will start with a simplified version of the graph example in the official documentation, which we will call graph0
. To represent relationships between people, we will first prepare person
table and friend_of
table and then create a graph based on these tables.
Create the base tables and a SQL graph:
CREATE TABLE person (
id NUMBER,
name VARCHAR2(10),
birth_date DATE
);
INSERT INTO person VALUES (1, 'John', to_date('13/06/1963', 'DD/MM/YYYY'));
INSERT INTO person VALUES (2, 'Mary', to_date('25/09/1982', 'DD/MM/YYYY'));
INSERT INTO person VALUES (3, 'Bob', to_date('11/03/1966', 'DD/MM/YYYY'));
INSERT INTO person VALUES (4, 'Alice', to_date('01/02/1987', 'DD/MM/YYYY'));
CREATE TABLE friend_of (
id NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
person_a NUMBER,
person_b NUMBER,
meeting_date DATE
);
INSERT INTO friend_of (person_a, person_b, meeting_date) VALUES (1, 3, to_date('01/09/2000', 'DD/MM/YYYY'));
INSERT INTO friend_of (person_a, person_b, meeting_date) VALUES (2, 4, to_date('19/09/2000', 'DD/MM/YYYY'));
INSERT INTO friend_of (person_a, person_b, meeting_date) VALUES (2, 1, to_date('19/09/2000', 'DD/MM/YYYY'));
INSERT INTO friend_of (person_a, person_b, meeting_date) VALUES (3, 2, to_date('10/07/2001', 'DD/MM/YYYY'));
CREATE PROPERTY GRAPH graph0
VERTEX TABLES (
person
KEY (id)
LABEL person
PROPERTIES (id, name, birth_date)
)
EDGE TABLES (
friend_of
KEY (id)
SOURCE KEY (person_a) REFERENCES person (id)
DESTINATION KEY (person_b) REFERENCES person (id)
LABEL friend_of
PROPERTIES (meeting_date)
);
New Types of Nodes and Edges
Here, to include information about the university a person belongs to, we will add a university
node and student_of
edge. This will require schema changes, such as creating new tables and updating the graph:
CREATE TABLE university (
id NUMBER,
name VARCHAR2(10)
);
INSERT INTO university VALUES (1, 'ABC');
INSERT INTO university VALUES (2, 'XYZ');
CREATE TABLE student_of (
id NUMBER GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
university_id NUMBER,
person_id NUMBER,
subject VARCHAR2(10)
);
INSERT INTO student_of (university_id, person_id, subject) VALUES (1, 1, 'Arts');
INSERT INTO student_of (university_id, person_id, subject) VALUES (1, 3, 'Music');
INSERT INTO student_of (university_id, person_id, subject) VALUES (2, 2, 'Math');
INSERT INTO student_of (university_id, person_id, subject) VALUES (2, 4, 'Science');
CREATE OR REPLACE PROPERTY GRAPH graph0
VERTEX TABLES (
person
KEY (id)
LABEL person
PROPERTIES (name, birth_date),
university
KEY (id)
LABEL university
PROPERTIES (name)
)
EDGE TABLES (
friend_of
KEY (id)
SOURCE KEY (person_a) REFERENCES person (id)
DESTINATION KEY (person_b) REFERENCES person (id)
LABEL friend_of
PROPERTIES (meeting_date),
student_of
KEY (id)
SOURCE KEY (person_id) REFERENCES person (id)
DESTINATION KEY (university_id) REFERENCES university (id)
LABEL student_of
PROPERTIES (subject)
);
New Properties
Next, we will add height information to each person. This will also require schema changes, such as adding a new column to the table and updating the corresponding graph definition:
ALTER TABLE person ADD (height FLOAT);
UPDATE person SET height = 1.80 WHERE id = 1;
UPDATE person SET height = 1.65 WHERE id = 2;
UPDATE person SET height = 1.75 WHERE id = 3;
UPDATE person SET height = 1.70 WHERE id = 4;
CREATE OR REPLACE PROPERTY GRAPH graph0
VERTEX TABLES (
person
KEY (id)
LABEL person
PROPERTIES (name, birth_date, height),
university
KEY (id)
LABEL university
PROPERTIES (name)
)
EDGE TABLES (
friend_of
KEY (id)
SOURCE KEY (person_a) REFERENCES person (id)
DESTINATION KEY (person_b) REFERENCES person (id)
LABEL friend_of
PROPERTIES (meeting_date),
student_of
KEY (id)
SOURCE KEY (person_id) REFERENCES person (id)
DESTINATION KEY (university_id) REFERENCES university (id)
LABEL student_of
PROPERTIES (subject)
);
Schema-less Approach
As mentioned above, SQL Graph is defined on top of relational database schemas. However, there are many cases where a schema-less graph is required, such as when migrating from other graph databases or constructing a knowledge graph from document data. Here, we will introduce a method to manage schema-less graphs using SQL Graph.
First, we create a nodes table to store all nodes and an edges table to store all edges. To flexibly store properties, we include a props column in JSON format.
CREATE TABLE graph1node (
id VARCHAR2(255)
, label VARCHAR2(255)
, props JSON
, CONSTRAINT graph1node_pk PRIMARY KEY (id)
);
CREATE TABLE graph1edge (
id VARCHAR2(255)
, src VARCHAR2(255)
, dst VARCHAR2(255)
, label VARCHAR2(255)
, props JSON
, CONSTRAINT graph1edge_pk PRIMARY KEY (id)
, CONSTRAINT graph1edge_fk_src FOREIGN KEY (src) REFERENCES graph1node(id)
, CONSTRAINT graph1edge_fk_dst FOREIGN KEY (dst) REFERENCES graph1node(id)
);
An SQL Graph will be created on top of these tables. Here, all node and edge labels will be set as node
and edge
respectively. Therefore, any information that should ideally be stored as a label (person
or friend_of
) will be kept as a label
property. Similarly, the JSON column will be stored as the props
property.
CREATE PROPERTY GRAPH graph1
VERTEX TABLES (
graph1node
KEY (id)
LABEL node
PROPERTIES (id, label, props)
)
EDGE TABLES (
graph1edge
KEY (id)
SOURCE KEY(src) REFERENCES graph1node(id)
DESTINATION KEY(dst) REFERENCES graph1node(id)
LABEL edge
PROPERTIES (id, label, src, dst, props)
);
Nodes and edges can be added to the nodes and edges tables using regular INSERT statements. Here, we will add person
nodes and friend_of
edges, as in the example above.
INSERT INTO graph1node VALUES ('p1', 'person', '{"name":"John", "birth_date":"1963-06-13"}');
INSERT INTO graph1node VALUES ('p2', 'person', '{"name":"Mary", "birth_date":"1982-09-25"}');
INSERT INTO graph1node VALUES ('p3', 'person', '{"name":"Bob", "birth_date":"1966-03-11"}');
INSERT INTO graph1node VALUES ('p4', 'person', '{"name":"Alice", "birth_date":"1987-02-01"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p1', 'p3', 'friend_of', '{"meeting_date":"2000-09-01"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p2', 'p4', 'friend_of', '{"meeting_date":"2000-09-19"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p2', 'p1', 'friend_of', '{"meeting_date":"2000-09-19"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p3', 'p2', 'friend_of', '{"meeting_date":"2001-07-10"}');
New Types of Nodes and Edges
In this case, there is no need to modify the table or graph definitions when creating university
nodes or student_of
edges.
INSERT INTO graph1node VALUES ('u1', 'university', '{"name":"ABC University"}');
INSERT INTO graph1node VALUES ('u2', 'university', '{"name":"XYZ University"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p1', 'u1', 'student_of', '{"meeting_date":"2000-09-01"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p3', 'u1', 'student_of', '{"meeting_date":"2000-09-19"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p2', 'u2', 'student_of', '{"meeting_date":"2000-09-19"}');
INSERT INTO graph1edge VALUES (sys_guid(), 'p4', 'u2', 'student_of', '{"meeting_date":"2001-07-10"}');
To add new properties, you only need to update the JSON with new elements, without schema changes.
UPDATE graph1node SET props = JSON_TRANSFORM(props, INSERT '$.height' = 1.80) WHERE id = 'p1';
UPDATE graph1node SET props = JSON_TRANSFORM(props, INSERT '$.height' = 1.65) WHERE id = 'p2';
UPDATE graph1node SET props = JSON_TRANSFORM(props, INSERT '$.height' = 1.75) WHERE id = 'p3';
UPDATE graph1node SET props = JSON_TRANSFORM(props, INSERT '$.height' = 1.70) WHERE id = 'p4';
Query Examples
While we stored the graph without altering the schema, how do we write search queries? The following query explores Mary’s friends in the original SQL Graph (graph0
) using the GRAPH_TABLE
function.
SELECT *
FROM GRAPH_TABLE (graph0
MATCH (a IS person) -[e IS friend_of]-> (b IS person)
WHERE a.name = 'Mary'
COLUMNS (a.name AS person_a, b.name AS person_b)
);
PERSON_A PERSON_B
---------- ----------
Mary Alice
Mary John
To execute the same query on the schema-less graph (graph1
), the query will look like the following. Since the original labels are stored in the label
property and the original properties are stored as JSON elements in the props
property, the query needs to be rewritten accordingly.
SELECT *
FROM GRAPH_TABLE (graph1
MATCH (a) -[e]-> (b)
WHERE JSON_VALUE(a.props, '$.name') = 'Mary'
AND a.label = 'person'
AND e.label = 'friend_of'
AND b.label = 'person'
COLUMNS (
JSON_VALUE(a.props, '$.name') AS person_a
, JSON_VALUE(b.props, '$.name') AS person_b
)
);
PERSON_A PERSON_B
---------- ----------
Mary Alice
Mary John
While the original query is simpler, both can search the same graph pattern. You can rewrite the JSON_VALUE
function using the JSON dot notation introduced in SQL:2023, as shown below.
SELECT *
FROM GRAPH_TABLE (graph1
MATCH (a) -[e]-> (b)
WHERE a.props.name.string() = 'Mary'
AND a.label = 'person'
AND e.label = 'friend_of'
AND b.label = 'person'
COLUMNS (
a.props.name.string() AS person_a
, b.props.name.string() AS person_b
)
);
Indexing
To handle large datasets with SQL Graph, it is crucial to have appropriate indexes on the base tables. In the case of the schema-less SQL Graph created above, adding the following indexes can help maintain high query performance for large graphs.
Endpoints of Edges
These are B-tree indexes used for joining the node/edge tables. When tracing edges starting from a node, a full table scan of the edge table would occur without these indexes, so these indexes are essential.
CREATE INDEX IF NOT EXISTS graph1edge_src_idx ON graph1edge (src);
CREATE INDEX IF NOT EXISTS graph1edge_dst_idx ON graph1edge (dst);
Full-text Index on Properties
Creating a full-text index on properties stored in JSON allows you to perform full-text searches on specific JSON elements. This is useful when searching for nodes by a property value, such as partial matches on names.
CREATE SEARCH INDEX graph1node_search_idx ON graph1node (props);
SELECT *
FROM graph1node
WHERE JSON_TEXTCONTAINS(props, '$.name', 'Mary');
For more information on using full-text searches on JSON columns and their performance, please refer to the following article!
Function Index on Properties
If you want to search for specific property values stored in JSON, you can use a function index with JSON_VALUE
. In the following example, a function index is created to speed up queries that sort by birth_date
.
CREATE INDEX IF NOT EXISTS graph1node_updated_idx
ON graph1node (JSON_VALUE(props, '$.birth_date' RETURNING DATE));
SELECT *
FROM graph1node
ORDER BY JSON_VALUE(props, '$.birth_date' RETURNING DATE);
Summary
In this article, we demonstrated how to store data similar to that of a schema-less graph database and execute traversal queries using the latest property graph features of relational databases. Although not covered here, it is also possible to assign multiple labels to nodes or differentiate between directed and undirected edges.
Graph databases not only stand out for their data models and query languages but also for their optimized implementations for traversing graphs. As multi-model databases like Oracle Database begin to include graph-specific implementations, we can look forward to utilizing SQL’s powerful expressiveness while supporting flexible data models and achieving robust scalability.