Monday, November 9, 2015

SSIS Tips: Performing Cross Join Using Merge Join Transform in SSIS


Can we perform cross join (cartesian product) operation using Merge Join transform in SSIS?


The above question was asked in one of the social forums recently. I was able to give the response to this.
I'm sharing the solution here with an illustration for the benefit of the wider audience


Scenario 1

Consider the case where we have two tables say Calendar table and Student table in two servers say server1 and server2. The requirement is to perform cross join between the tables.

For this requirement we can use the below sample data
In server 1

FROM dbo.CalendarTable('20150101','20150601',0,0)f

The UDF CalendarTable can be found here

In server 2 create this table

StudentID int,
StudentName varchar(100),
RollNo int

INSERT Students

Once these tables are setup we can go ahead and create a SSIS package as below

The source statements used in OLEDB Source tasks are as below. We need to use SQL command mode for this.

Student Data
SELECT *,1 AS Dummy
FROM dbo.Students

Date Data
SELECT *,1 AS Dummy
FROM dbo.DateTable
WHERE Date < '20150110'

I've just included some indicative dates for the illustration. In the actual case you can include dates as per your requirement
 Once these  tables are setup we will add a merge join transform to the package and configure it as below

Now link the output of this merge join to a recordset destination. You may also add a data viewer for previewing the output.

Now try executing the package and you can see the below

As you see from the above what you get as result would be the cartesian product of the records from both the tables which indicates that merge join performed a cross join operation.

You can also refer the count of records and can confirm its a cartesian join (m * n)

Scenario 2

Now lets see the case where one set of data comes from a database table and other set coming from a flat file. 
In this case the package would look like this

If you compare this to the earlier work flow you can see the below differences
1. One of the OLEDB source will be replaced by a flat file source as one of our sources is a flat file
2. We would require two additional tasks in the work flow for the flat file. One would be a derived column task to add the dummy column to the pipeline
3. Second task would be a sort transform to add a default sorting. This is a prerequisite for the merge join transform which requires both its inputs to be in sorted format.

Execute the package and you can see the below

As you see from the above the output clearly shows that merge join transform performs a cross join operation.


As seen from the above illustration you can very easily perform cross join by using a Merge Join transform in SSIS by utilizing a dummy column created on the fly inside the package


Tuesday, November 3, 2015

T-SQL Tips: Execute Individual Packages in Project Deployment Model using Dtexec


Can we execute individual packages included in a SSIS 2012 project configured in project deployment model using dtexec utility?


The above question was the main topic of discussion in one of the recent casual conversations with my fellow colleagues. 
Both the below responses came up during the discussion

1. The package can only be executed by referring to the ispac (project) file as deployment model being specified is project deployment model which implies project itself as the basic unit
2. The package should still be able to be executed individually by referring to the corresponding dtsx file.

I thought of trying it for myself to see which one (or both) of the above responses are true. I'm sharing the illustration and result through this blog for the benefit of the wider audience


For the sake of this illustration I've created a very simple package in a project with deployment model being configured as default project deployment.

The package is trying to insert a row into a table in the database by receiving  a value through a parameter. There is a package scope parameter which is declared for the purpose as below

The table is setup as below

CREATE TABLE [dbo].[TestISPop](
[Val] [varchar](100) NULL

Now lets try executing the package once and check the result

As you see the package will populate the table with value that is passed from the parameter

Now lets try executing the package through dtexec. 

If you refer to the documentation of dtexec given below

You can see that the package can be executed using /FILE switch and pass parameter values through /SET switch
Now lets apply this in a sql query using xp_cmdshell as dtexec is a command line tool and see the result

The statement would look like

EXEC xp_cmdshell 'dtexec /FILE "\DtexecTest.dtsx"  /SET \Package.Variables[$Package::PopValue];2'

The execution result is as below

Now if you check the table you can see the value being populated

So it is implied that we can execute the package directly from dtexec using /FILE switch and passing package parameters through /SET \Package.Variables[$Package::ParameterName] switch

In addition to this we can also execute this by specifying the ispac file. 
Lets see how we can write the query for this

EXEC xp_cmdshell 'dtexec /Project "\bin\Development\Integration Services Project1.ispac" /Package "DtexecTest.dtsx" /SET \Package.Variables[$Package::PopValue];5'

Execution result as shown

Now check the execution result and you will see the below

As you see it will successfully populate the values. So we can see that we can execute the package either by referring to dtsx file directly using /FILE switch or by specifying ispac file using /PROJECT switch in dtexec.

Package with a Project Parameter

Now lets see what would be the behavior in case the package had a project parameter.
For this we will first modify the package parameter to make it a project parameter
Once that's done we can use a statement as below

EXEC xp_cmdshell 'dtexec /FILE "C:\Users\stu\Documents\Visual Studio 2010\Projects\Integration Services Project1\Integration Services Project1\DtexecTest.dtsx"  /SET \Package.Variables[$Project::PopValue];6'

But we can see that we will get an error like this

This shows that dtexec is unable to resolve the reference to the project parameter while trying to execute directly from the dtsx file which is understandable and as expected
Now lets try execute package from ispac file and see

EXEC xp_cmdshell 'dtexec /Project "C:\Users\stu\Documents\Visual Studio 2010\Projects\Integration Services Project1\Integration Services Project1\bin\Development\Integration Services Project1.ispac" /Package "DtexecTest.dtsx" /SET \Package.Variables[$Package::PopValue];12'

And you can see that it works fine and populates the table with the value as shown by the below result

So we can conclude that in the case of package with reference to a project parameter we can only execute it by referring to the ispac (project) file and not by directly referring to the individual dtsx file


As per the illustrations above we can summarize as below

1. Package with parameters defined in the package scope - Can be executed by referring both ispac by using /PROJECT switch or dtsx file by using /FILE switch in dtexec 
2. Package with parameter being defined in the project scope - Can only be executed by referring to the ispac file using /PROJECT switch


Sample package


SSIS 2012 syntaxes


Thursday, October 15, 2015

T-SQL Tips: Solving FORMAT function woes for time datatype


The purpose of this blog is to clarify the confusion regarding the behavior of FORMAT function with the time datatype in SQL Server


It all started with a discussion that happened in one of the groups. 
A person had posted a question asking about reason for the weird behavior of FORMAT function while applied on various date and time related datatypes in SQL 2012.
The illustration is given below

declare @t time ='20151015 13:40:20'
SELECT FORMAT(@t,'HH:mm') AS timefieldformatted

declare @dt datetime ='20151015 13:40:20'
SELECT FORMAT(@dt,'HH:mm')  AS datetimefieldformatted

declare @dt2 datetime2 ='20151015 13:40:20'
SELECT FORMAT(@dt2,'HH:mm')  AS datetime2fieldformatted

declare @dto datetimeoffset ='20151015 13:40:20'
SELECT FORMAT(@dto,'HH:mm')  AS datetimeoffsetfieldformatted

The output is as shown below

As you see from the above FORMAT was able to apply the passed specifiers over the value and apply the requested format in all cases except in the case of time datatype. So what is so special about time datatype which is causing this weird behavior from FORMAT function? Lets investigate


To understand the reason lets first refer the documentation of FORMAT function as laid out by MSDN

The below explanation is taken from the above link

FORMAT relies on the presence of .the .NET Framework Common Language Runtime (CLR).

The following table lists the acceptable data types for the value argument together with their .NET Framework mapping equivalent types.

If you see the above table you can see that time datatype in SQLServer  is mapped to TimeSpan .NET datatype
Now if you check TimeSpan datatype documentation for .NET 

You can see this

TimeSpan object represents a time interval (duration of time or elapsed time) that is measured as a positive or negative number of days, hours, minutes, seconds, and fractions of a second. The TimeSpan structure can also be used to represent the time of day, but only if the time is unrelated to a particular date

Now if we check the standard format strings for the TimeSpan datatype it only supports c, g and G as the specifiers

The way to specify custom specifiers is discussed in the below link

So as per the above links we need to use query as below to get the required result

declare @t1 time ='20151015 13:40:20'
SELECT FORMAT(@t1,'c') AS timefieldformatted

declare @t2 time ='20151015 13:40:20'
SELECT FORMAT(@t2,'g') AS timefieldformatted

declare @t3 time ='20151015 13:40:20'
SELECT FORMAT(@t3,'hh\:mm') AS timefieldformatted

The output will be as below


As seen from the above illustrations we should use specifiers as c, g or hh:\mm for getting time datatype values in the formats as hh:mm as the implementation of FORMAT function maps time datatype to TimeSpan .NET datatype.
Thanks to Erland who gave the solution in the forum discussion and cleared up the doubts in all of us. Hopefully this blog will help to improve the clarity on the behavior of FORMAT function to a wider audience

Sunday, August 9, 2015

SQLServer Tips: Distributed Query Issues With ACE Provider in 64 bit

Here is another quick blog on solution for an issue you may face while working with ACE provider in 64 bit windows server


A colleague of mine came to me today morning with an issue. He was trying out few distributed query options using OPENROWSET for exporting some data from Excel 2007 for an adhoc requirement utilizing the ACE provider . Suddenly he got a transport error and couldn't connect to instance anymore.
I started a remote desktop session to the server and to my surprise I saw SQL Server service and SQL Agent agent being down. I restarted them and asked him to try again and monitored myself this time. The OPENROWSET query fetched the results and immediately after that gave the same transport error. At the same time the services for SQL Server and SQL Agent went down. So it was obvious that ACE provider distributed query was causing the services to be down.
I tried the same query in my server and it worked fine. But we were consistently able to reproduce the issue in the problem server. My server was having Windows Server 2008 R2 as the OS whereas the one with the problem had Windows Server 2012 as the OS.


I was not sure what should be done for solving this however I decided to apply the below specified steps to see if it fixes the issue.

as they also related to ACE provider issues

As I expected doing the above steps fixed the issue.
The steps that were of significant relevance in the above case were the below

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0'
    , N'AllowInProcess', 1
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0'
    , N'DynamicParameters', 1


Once this is applied the distributed queries started working fine without affecting SQLServer services.
Something to lookout for the next time when you experience similar problems!

Tuesday, August 4, 2015

SQL Server Data Tools BI Installation Issues on Windows 10

This is a short blog to share the details of an issue that you may face while installing SQL Server Data Tools for BI (SSDT-BI) on Windows 10


You've a laptop or desktop which you upgraded to Windows 10 recently and you're trying to install SSDT-BI in it. You may get an error during the installation and log files will have an entry similar to below
 Stop Block: CompatibilityMode : Windows Program Compatibility mode is on. Turn it off and then try Setup again.

Please note that this error happens only when when you're trying for a fresh install of SSDT-BI on Windows 10. If you already had SSDT installed before the upgrade it should still work fine.
Microsoft has acknowledged this as a bug and the fix is expected to be out soon.

Meanwhile you can apply the below workaround if you want to get things sorted out right away.


The workaround is to follow the below sequence

  • Start with the full installation of Visual Studio on Windows 10
  • Subsequently do the installation of SSDT-BI on top of it
Once this sequence is followed the installation will complete successfully and you would be able to use SSDT-BI for your development.

If you don't want to do this, you can wait for patch to release with the Windows updates for avoid getting the  above specified issue

Hope this will help the people facing similar issues with a quick fix method until the patches are out.