Analyze Bank Transaction Data using Graph (Part 1/2)

Ryota Yamanaka
7 min readJul 13, 2023
Photo by Eduardo Soares on Unsplash (edited)

In this article, as a first step in using Oracle Graph, we will look at how to manage bank transaction data as a graph and what kind of analysis can be done. Fraud detection in financial institutions is one of the major use cases for graph databases. Graphs are also expected to be used for anti-money laundering, so such use cases should continue to expand.

This article is a revised version of an earlier article to reflect the current products (Oracle Database 23c and Graph Server 23.2).

Why Graph?

First, I will discuss why it is helpful to treat transactions such as bank transfers as graphs. For example, suppose we have a dataset of bank transactions consisting of a customer table, an account table, and a transaction table. We want to answer the following (probably very common) two questions:

  • Is there a flow of money between Bob and Charlie? Even if there is no direct transaction, you may be able to say that there is an indirect flow of money through other accounts.
  • Are there clusters of transactions? That is, is there a pattern of frequent transactions between some accounts and less frequent transactions between those accounts and others?

Assuming we are using an RDBMS, answering the first question naturally requires table-to-table joins and self-joins. The more hops to find a relationship, the more joins are needed. The second question is harder to answer and probably requires procedural code (such as PL/SQL for Oracle) to build the graph structure and determine the clusters.

Now let’s consider the case of treating it as a graph.

Graphs have two structures: nodes and edges. So if we want to represent the information in the tables as a graph, we can model accounts as nodes and transactions as edges. This representation is intuitive, as shown in the following figure, and we can simply follow the edges from account a1 to a3, a5, and a4, to answer the first question. Ideally, such a database system should provide an efficient way to find the path between a1 and a4.

To answer the second question, we need to implement or use an existing graph algorithm to find clusters based on the network structure of transactions. Many such algorithms have been proposed and used for community detection in social network analysis.

Create a Graph

In Oracle Database 23c, a new graph syntax for creating and querying graphs in SQL was introduced, so we will use this to create graphs from table data. In versions prior to 23c, the same operation can be performed using PGQL.

CREATE PROPERTY GRAPH bank_graph
VERTEX TABLES (
bank_customer
KEY (cst_id)
LABEL customer
PROPERTIES (cst_id, first_name, last_name)
, bank_account
KEY (acc_id)
LABEL account
PROPERTIES (acc_id)
)
EDGE TABLES (
bank_transaction
KEY (txn_id)
SOURCE KEY (acc_id_src) REFERENCES bank_account (acc_id)
DESTINATION KEY (acc_id_dst) REFERENCES bank_account (acc_id)
LABEL transferred_to
PROPERTIES (txn_id, datetime, amount)
, bank_account AS bank_account_owns
KEY (acc_id)
SOURCE KEY (cst_id) REFERENCES bank_customer (cst_id)
DESTINATION KEY (acc_id) REFERENCES bank_account (acc_id)
LABEL owns
);
Property graph created.

Let’s try pattern matching against this graph.

First, we will check the accounts held by the customer with cst_id = 10.

SELECT * FROM GRAPH_TABLE (bank_graph
MATCH (c IS customer)-[e IS owns]->(a IS account)
WHERE c.cst_id = 10
COLUMNS (c.cst_id, c.first_name, a.acc_id)
);
    CST_ID FIRST_NAME   ACC_ID
---------- ---------- ----------
10 Laura 10
10 Laura 90

Query and Visualization

Now, let’s try a few queries and their visualization. When using the built-in visualization tool that comes with Oracle Database, we will slightly modify the above queries to retrieve the IDs of the vertices and edges.

SELECT *
FROM GRAPH_TABLE (bank_graph
MATCH (c IS customer)-[e IS owns]->(a IS account)
WHERE c.cst_id = 10
COLUMNS (VERTEX_ID(c) AS id_c, EDGE_ID(e) AS id_e, VERTEX_ID(a) AS id_a)
)

We can see that this customer’s name is Laura, and she owns two accounts.

Next, let’s show all of Laura’s transfers. We will add one more hop to the matching pattern. Since there are two accounts in the pattern, we assign different variables for them, like a and a1.

SELECT DISTINCT *
FROM GRAPH_TABLE (bank_graph
MATCH (c IS customer)-[e IS owns]->(a IS account)-[t IS transferred_to]->(a1 IS account)
WHERE c.cst_id = 10
COLUMNS (
VERTEX_ID(c) AS id_c,
EDGE_id(e) AS id_e,
VERTEX_ID(a) AS id_a,
EDGE_id(t) AS id_t,
VERTEX_ID(a1) AS id_a1
)
)

You can see that Laura has transferred money to some accounts using both of the two accounts.

Now, focusing only on the transfers between accounts, let’s look for a pattern starting from the account with acc_id = 10 and returning to the same account in 3 hops. Furthermore, we can add a condition that these transfers happened in this order in time, and the amount of money transfer must always be more than 500.

SELECT DISTINCT *
FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t1 IS transferred_to]->(a2)
, (a2)-[t2 IS transferred_to]->(a3)
, (a3)-[t3 IS transferred_to]->(a1)
WHERE a1.acc_id = 10
AND t1.amount > 500 AND t2.amount > 500 AND t3.amount > 500
AND t1.datetime < t2.datetime AND t2.datetime < t3.datetime
COLUMNS (
VERTEX_ID(a1) AS id_a1, EDGE_ID(t1) AS id_e1
, VERTEX_ID(a2) AS id_a2, EDGE_ID(t2) AS id_e2
, VERTEX_ID(a3) AS id_a3, EDGE_ID(t3) AS id_e3
)
)

In the whole graph, this pattern is detected only in the paths that include the accounts with acc_id = 10, 75, and 77.

Similarly, we will try it for 4 hops. The query is getting longer, but it is still much more compact than it would be in SQL.

SELECT DISTINCT *
FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t1 IS transferred_to]->(a2)
, (a2)-[t2 IS transferred_to]->(a3)
, (a3)-[t3 IS transferred_to]->(a4)
, (a4)-[t4 IS transferred_to]->(a1)
WHERE a1.acc_id = 10
AND a1.acc_id != a3.acc_id AND a2.acc_id != a4.acc_id
AND t1.amount > 500 AND t2.amount > 500
AND t3.amount > 500 AND t4.amount > 500
AND t1.datetime < t2.datetime
AND t2.datetime < t3.datetime
AND t3.datetime < t4.datetime
COLUMNS (
VERTEX_ID(a1) AS id_a1, EDGE_ID(t1) AS id_e1
, VERTEX_ID(a2) AS id_a2, EDGE_ID(t2) AS id_e2
, VERTEX_ID(a3) AS id_a3, EDGE_ID(t3) AS id_e3
, VERTEX_ID(a4) AS id_a4, EDGE_ID(t4) AS id_e4
)
)

Again, the pattern is detected only in one combination in the case of 4 hops.

It is also possible to aggregate using GROUP BY. For example, let’s look for accounts that received a large number of small remittances (less than 500) during the period between Oct 1 and Dec 1.

SELECT t.acc_id, COUNT(acc_id) AS num_of_txn
FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t IS transferred_to]->(a2)
WHERE t.datetime >= TIMESTAMP '2020-10-01 00:00:00'
AND t.datetime < TIMESTAMP '2020-12-01 00:00:00'
AND t.amount < 500.00
COLUMNS (
a2.acc_id
)
) t
GROUP BY t.acc_id
ORDER BY num_of_txn DESC
FETCH FIRST 10 ROWS ONLY

However, the aggregation results cannot be represented as a graph, so you will have to show them in a table.

The account with acc_id = 23 has received 29 transfers, so let’s show them.

SELECT *
FROM MATCH (a1)-[t:transferred_to]->(a2)<-[o:owns]-(c) ON graph2
WHERE t.datetime >= TIMESTAMP '2020-10-01 00:00:00'
AND t.datetime < TIMESTAMP '2020-12-01 00:00:00'
AND t.amount < 500.00
AND a2.acc_id = 23

You can confirm the transfers.

What do you think? You can see that transactional data such as bank transfers stored in a table can be managed as a graph by simply creating a graph view using the CREATE PROPERTY GRAPH statement.

At the same time, the SQL queries are still executed against tables, so we cannot expect the use of the new syntax would dramatically improve query performance. Running the graph algorithm on top of this mechanism is not efficient either. In Part 2, we will introduce “Graph Server” to overcome these challenges.

Please learn more about Oracle Graph from:

--

--