Search This Blog


Tuesday, May 5, 2015

Building hierarchical data in Kettle (alternative of recursive CTE)

Today I got a requirement to build the hierarchy structure of the available data (like RECURSIVE CTE in SQL Server) in Amazon Redshift database but unfortunately Redshift doesn't support RECURSIVE CTE or PROCEDURAL langugage like variables, stored procedures or user-defined functions.

- Pick the data from table (item) in Redshift database.
- Build the hierarchical data.
- Populate the data into another table (tblParkingItem) in same Redshift database.

So I was able to achieve this by using an ETL tool - in this example using open-source tool named as Penthao Kettle:

Script to create sample table(s) along with some data:

CREATE TABLE item (assetid INT, assetname varchar(25), parentassetid INT)
CREATE TABLE tblparkingitem (assetid INT, assetname varchar(25), parentassetname varchar(25), Lvl INT)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(1,'aa',null)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(100,'a',1)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(101,'b',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(102,'c',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(103,'d',101)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(105,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(106,'f',108)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(107,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(108,'g',106)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(109,'bb',null)

Kettle Job/Transformation:

Job: Executes the transformation in specified order & do the looping stuff

Transfomation1: "Get the parent data":


  • "Populate parent data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select assetid, assetname, null, '${Level}' as lvl from item where parentassetid is null;

  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';

Here ${Level} is variable whose value is assigned at Job level.

Transfomation2: "Get the child data":


  • "Populate child data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select a.assetid, a.assetname, b.assetname as parentassetname, '${Level}' from item a inner join tblparkingitem b on a.parentassetid=b.assetid and exists (SELECT 1 FROM tblparkingitem c WHERE b.assetid = c.assetid and c.lvl = ('${Level}'-1));

  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';

Here ${Level} is variable whose value is assigned at Job level.

To validate the result, you can check the data in tblparkingitem table by executing:
Select * from tblparkingitem;

Attached is the sample code.
Kettle sample code

Wednesday, June 25, 2014

Database lookup not fetching the latest database field in Penthao - Kettle

Today I was developing a transformation in Kettle & using "Database lookup" step in it. During development I realised that one more column need to be added into the lookup table and then I added that column in database using ALTER TABLE statement.

But in transformation, when I clicked on "Get Lookup Fields" button of "Database lookup" it was not showing the newly added column. I removed the "Database lookup" step from the transformation and then added it back (hoping that it will now show all the columns present in database) but still it was not showing the newly added column.

After googling, I found the following option to handle the above issue:

Go to the main menu action, select Tools -> Database -> Clear Cache

After clearing cache, the newly added column start appearing into the "Database lookup" step.

Thursday, March 27, 2014

Shrink the data or log files using DBCC SHRINKFILE

DBCC SHRINKFILE shrinks the size of database/log file of the current database either to the specific size or as small as it can.

Below query gives you the name of the database/log file, Total size(MB), Used Space(MB), %Used, Physical location of file and the Shrink SQL statement with the specific size which you can execute on the database to shrink the respective file:

SELECT file_id ,name ,size_mb=size/128.0 ,UsedSpace_MB=fileproperty(name, 'spaceused')/128.0 ,UsedSpace_Pct = CAST(fileproperty(name, 'spaceused')*1./size as decimal(5,2)) * 100 ,physical_name ,shrink_stmt='dbcc shrinkfile(' + CAST(file_id as varchar) + ',' + CAST((fileproperty(name, 'spaceused')/128 + 1) as varchar) + ')' FROM sys.database_files

Tuesday, September 11, 2012

SSIS - FastLoad options with Triggers

Lets say I have a table (DestTable) which has a trigger on INSERT event.
Now if I am populating the data into this table (DestTable) using SSIS FastLoad option then "will the trigger execute?".

Any guess!!!!!!!!!!!!!!!!!!

Using FastLoad option to populate the records into the table will not execute trigger.
If you want to execute the trigger using FastLoad option then right-click the destination component and click "Show Advance Editor....", go to "Component Properties" and then mention FIRE_TRIGGERS in FastLoadOptions.
Same we use while inserting the data using BULK INSERT in T-SQL.

Wednesday, November 16, 2011

Alternative of CTE

Few days back I was facing a performance issue due to CTE. Then I replaced the CTE with set of sql statements and got the performance boost of 96%.
Here is the sample code which I used :

CREATE TABLE #t (assetid INT, assetname varchar(25), parentassetid INT)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(1,'aa',null)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(100,'a',1)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(101,'b',100)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(102,'c',100)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(103,'d',101)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(105,'e',102)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(106,'f',108)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(107,'e',102)

CREATE TABLE #item (id INT identity(1,1), assetid INT, assetname varchar(25), lvl INT)
DECLARE @counter INT
SET @Counter = 0
INSERT INTO #item (assetid,assetname,lvl) SELECT assetid,assetname, @counter FROM #t WHERE parentassetid = 1


SET @counter = @counter + 1

INSERT INTO #item (assetid,assetname,lvl) SELECT a.assetid,a.assetname,@counter FROM #t a INNER JOIN #item b
on a.parentassetid = b.assetid and exists (SELECT 1 FROM #item c WHERE b.assetid = c.assetid and c.lvl = (@counter-1))


I hope this will help you!!!!

Friday, June 17, 2011

Alternate way of RowCount using sys.partitions

Usually we use COUNT(1) to get the number of rows in a table which is time consuming in case of large tables.

Here is the alternate method to get the Row Count using system view in efficient manner:

WHERE INDEX_ID IN (0,1) -- 0 = Heap;1 = Clustered

To know more about sys.partitions

Wednesday, September 1, 2010

Find the open transaction details in Sql Server

The below query can be used to get the details of OPEN TRANSACTION:

SELECT DB_NAME(dbid) AS DBNAME, (SELECT text FROM sys.dm_exec_sql_text(sql_handle)) AS SQLSTATEMENT FROM master..sysprocesses WHERE open_tran > 0