Search This Blog

Wednesday, August 24, 2016

How to know when Amazon Redshift database restarted

Reshift uses query_group called "xen_is_up" to executes queries in order to get back in an online state. The following query can be used to determine when the restart happened:

select endtime from stl_utilitytext where text ilike '%xen_is_up.sql%' order by endtime;


Note: ilike match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

Thursday, April 7, 2016

Amazon Redshift Commands

Here are some useful Amazon Redshift commands which we can use for administrator and monitoring purpose:

  • The below command will give the info about the used disk space of redshift cluster nodes:

          select node, diskno, used, capacity, (used*1.0/capacity)*100 as                       "percent" from (select owner as node, diskno, used, capacity from                      stv_partitions order by 1, 2, 3, 4) as T;
  • To get the running queries at Redshift cluster:

           select * from stv_recents where status = 'Running';
  • To get information like disk space, rows of all the tables under specific schema (here i am using for public schema):

          select a.id,trim(pgdb.datname) as Database, trim(pgn.nspname) as                   Schema, trim(a.name) as Table, b.mbytes, a.rows 
          from (select db_id, id, name, sum(rows) as rows
          from stv_tbl_perm a group by db_id, id, name) as a
          join pg_class as pgc on pgc.oid = a.id
          join pg_namespace as pgn on pgn.oid = pgc.relnamespace
          join pg_database as pgdb on pgdb.oid = a.db_id
          join (select tbl, count(*) as mbytes from stv_blocklist
          group by tbl) b on a.id = b.tbl and lower(pgn.nspname)='public'
          order by mbytes desc, a.db_id, pgn.nspname, a.name; 

  • Distribution/Sort key, Unsorted data percentage of tables in "public" schema:

          select * from svv_table_info where schema='public';

  • Active session in Redshift cluster:

          select * from stv_sessions;

  • To find the distribution of table data among nodes:

          select name,slice,  sum(num_values) from svv_diskusage
          where lower(name) = '' and col=0 group by name,slice                   order by slice;

  • To identify and resolve database tables conflict:

         select * from stl_tr_conflict where table_id = order by xact_start_ts;

  • To find which tables data are available in the specific node:

          select tbl,name, sum(num_values) from svv_diskusage where col =0 and            slice in (0,1) group by tbl,name order by sum(num_values) desc;

note: each node is divided into number of slices. Number of slices per node depends upon the cluster size.

  • How many queries are executed till last restart in each user-defined WLM queue:

      select service_class,num_queued_queries, num_executing_queries, 
       num_executed_queries from stv_wlm_service_class_state 
       where service_class > 5;




Tuesday, May 5, 2015

Building hierarchical data in Kettle (alternative of recursive CTE)

Today I got a requirement to build the hierarchy structure of the available data (like RECURSIVE CTE in SQL Server) in Amazon Redshift database but unfortunately Redshift doesn't support RECURSIVE CTE or PROCEDURAL langugage like variables, stored procedures or user-defined functions.

Requirement:
- Pick the data from table (item) in Redshift database.
- Build the hierarchical data.
- Populate the data into another table (tblParkingItem) in same Redshift database.

So I was able to achieve this by using an ETL tool - in this example using open-source tool named as Penthao Kettle:

Script to create sample table(s) along with some data:


CREATE TABLE item (assetid INT, assetname varchar(25), parentassetid INT)
CREATE TABLE tblparkingitem (assetid INT, assetname varchar(25), parentassetname varchar(25), Lvl INT)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(1,'aa',null)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(100,'a',1)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(101,'b',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(102,'c',100)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(103,'d',101)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(105,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(106,'f',108)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(107,'e',102)
INSERT INTO item (assetid,assetname,parentassetid) VALUES(108,'g',106)

INSERT INTO item (assetid,assetname,parentassetid) VALUES(109,'bb',null)

Kettle Job/Transformation:

Job: Executes the transformation in specified order & do the looping stuff




Transfomation1: "Get the parent data":



Queries:

  • "Populate parent data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select assetid, assetname, null, '${Level}' as lvl from item where parentassetid is null;


  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';

Here ${Level} is variable whose value is assigned at Job level.


Transfomation2: "Get the child data":




Queries:

  • "Populate child data" component: 

insert into tblparkingitem (assetid , assetname , parentassetname , Lvl)
select a.assetid, a.assetname, b.assetname as parentassetname, '${Level}' from item a inner join tblparkingitem b on a.parentassetid=b.assetid and exists (SELECT 1 FROM tblparkingitem c WHERE b.assetid = c.assetid and c.lvl = ('${Level}'-1));


  • "Table input" component: 
select count(1) as RowCnt, '${Level}' + 1 as Lvl from tblparkingitem where lvl = '${Level}';


Here ${Level} is variable whose value is assigned at Job level.


To validate the result, you can check the data in tblparkingitem table by executing:
Select * from tblparkingitem;


Attached is the sample code.
Kettle sample code

Wednesday, June 25, 2014

Database lookup not fetching the latest database field in Penthao - Kettle

Today I was developing a transformation in Kettle & using "Database lookup" step in it. During development I realised that one more column need to be added into the lookup table and then I added that column in database using ALTER TABLE statement.

But in transformation, when I clicked on "Get Lookup Fields" button of "Database lookup" it was not showing the newly added column. I removed the "Database lookup" step from the transformation and then added it back (hoping that it will now show all the columns present in database) but still it was not showing the newly added column.

After googling, I found the following option to handle the above issue:

Go to the main menu action, select Tools -> Database -> Clear Cache

After clearing cache, the newly added column start appearing into the "Database lookup" step.

Thursday, March 27, 2014

Shrink the data or log files using DBCC SHRINKFILE

DBCC SHRINKFILE shrinks the size of database/log file of the current database either to the specific size or as small as it can.

Below query gives you the name of the database/log file, Total size(MB), Used Space(MB), %Used, Physical location of file and the Shrink SQL statement with the specific size which you can execute on the database to shrink the respective file:

SELECT file_id ,name ,size_mb=size/128.0 ,UsedSpace_MB=fileproperty(name, 'spaceused')/128.0 ,UsedSpace_Pct = CAST(fileproperty(name, 'spaceused')*1./size as decimal(5,2)) * 100 ,physical_name ,shrink_stmt='dbcc shrinkfile(' + CAST(file_id as varchar) + ',' + CAST((fileproperty(name, 'spaceused')/128 + 1) as varchar) + ')' FROM sys.database_files

Tuesday, September 11, 2012

SSIS - FastLoad options with Triggers

Lets say I have a table (DestTable) which has a trigger on INSERT event.
Now if I am populating the data into this table (DestTable) using SSIS FastLoad option then "will the trigger execute?".

Any guess!!!!!!!!!!!!!!!!!!

Using FastLoad option to populate the records into the table will not execute trigger.
If you want to execute the trigger using FastLoad option then right-click the destination component and click "Show Advance Editor....", go to "Component Properties" and then mention FIRE_TRIGGERS in FastLoadOptions.
Same we use while inserting the data using BULK INSERT in T-SQL.

Wednesday, November 16, 2011

Alternative of CTE

Few days back I was facing a performance issue due to CTE. Then I replaced the CTE with set of sql statements and got the performance boost of 96%.
Here is the sample code which I used :

CREATE TABLE #t (assetid INT, assetname varchar(25), parentassetid INT)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(1,'aa',null)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(100,'a',1)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(101,'b',100)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(102,'c',100)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(103,'d',101)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(105,'e',102)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(106,'f',108)
INSERT INTO #t (assetid,assetname,parentassetid) VALUES(107,'e',102)


CREATE TABLE #item (id INT identity(1,1), assetid INT, assetname varchar(25), lvl INT)
DECLARE @counter INT
SET @Counter = 0
INSERT INTO #item (assetid,assetname,lvl) SELECT assetid,assetname, @counter FROM #t WHERE parentassetid = 1

WHILE @@ROWCOUNT > 0
BEGIN

SET @counter = @counter + 1

INSERT INTO #item (assetid,assetname,lvl) SELECT a.assetid,a.assetname,@counter FROM #t a INNER JOIN #item b
on a.parentassetid = b.assetid and exists (SELECT 1 FROM #item c WHERE b.assetid = c.assetid and c.lvl = (@counter-1))

END
SELECT * FROM #item

I hope this will help you!!!!