Search This Blog

Tuesday, May 5, 2015

Building hierarchical data in Kettle (alternative of recursive CTE)

Today I got a requirement to build the hierarchy structure of the available data (like RECURSIVE CTE in SQL Server) in Amazon Redshift database but unfortunately Redshift doesn't support RECURSIVE CTE or PROCEDURAL langugage like variables, stored procedures or user-defined functions.

Requirement:
- Pick the data from table (item) in Redshift database.
- Build the hierarchical data.
- Populate the data into another table (tblParkingItem) in same Redshift database.

So I was able to achieve this by using an ETL tool - in this example using open-source tool named as Penthao Kettle:

Script to create sample table(s) along with some data:


CREATE TABLE item (assetid INT, assetname varchar(25), parentassetid INT)
CREATE TABLE tblparkingitem (assetid INT, assetname varchar(25), parentassetname varchar(25), Lvl INT)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(1,'aa',null)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(100,'a',1)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(101,'b',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(102,'c',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(103,'d',101)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(105,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(106,'f',108)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(107,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(108,'g',106)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(109,'bb',null)

Kettle Job/Transformation:

Job: Executes the transformation in specified order & do the looping stuff




Transfomation1: "Get the parent data":



Queries:

  • "Populate parent data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select assetid, assetname, null, '${Level}' as lvl from item where parentassetid is null;


  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';

Here ${Level} is variable whose value is assigned at Job level.


Transfomation2: "Get the child data":




Queries:

  • "Populate child data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select a.assetid, a.assetname, b.assetname as parentassetname, '${Level}' from item a inner join tblparkingitem b on a.parentassetid=b.assetid and exists (SELECT 1 FROM tblparkingitem c WHERE b.assetid = c.assetid and c.lvl = ('${Level}'-1));


  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';


Here ${Level} is variable whose value is assigned at Job level.


To validate the result, you can check the data in tblparkingitem table by executing:
Select * from tblparkingitem;


Attached is the sample code.
Kettle sample code

No comments:

Post a Comment