Review of Melissadata Matchup for SSIS
Review of Wherescape Red
Review of Microsoft Master Data Services (MDS)
Review of Microsoft SqlServer Integration Services (SSIS)
The story begins with a Sql Server upgrade from 2012 to 2016. But the upgrade itself also forced dealing with existing issues we didn’t know we had. That plus the opportunity to take advantage of new features are what account for this dramatic change.
Learning #1: Vmware CPU hot swap option is bad. Upon first upgrading performance was worse than 2012 overall and some queries seemed to hang without returning rows. Excessive cxpacket waits and blocking threads on the same session indicated a parallelism issue. I found the vmware best practices for sql server said not to use this option and that all our sql servers were setup using this option. Turning it off and bouncing the severs gave us from 30 to 80% speed increase for parallel operations and the freezing issue became much less frequent.
Learning #2: SSIS basic logging trashes performance. SSIS in 2016 has the option of custom logging. Knowing we don’t need or use logging unless there’s an error (we instead rely on the much more useful and efficient BiXpress audit framework), I created a custom log plan (only takes a few minutes) to only log error events and replaced all our basic log assignments in our job steps. This knocked off another 30% or so from all our job steps.
Learning #3: New cardinality estimator can loose its mind with nasty queries. I’ve had a problematic query for sometime that has implicit datatype conversions. Since it’s generated with a dynamic where clause for incremental loads, parameter sniffing has been an issue for performance. I had previously solved this by using the recompile option. But in 2016 the query would unpredictably go serial. And instead of running 3 minutes it would run for an hour and up to 6 hours! The solution? Use of the new enable parallel plan preference. This took the query down to under 1 minute! But it would still occasionally hang or run for 45 to 120 minutes with little change in volume. The final solution was to use the new hint to prefer the legacy cardinality estimator (which was a trace flag in 2014).
Lesson #4: Updatable clustered columstores! Previously I dropped and recreated non-clustered columnstores…I knew being able to update rather than recreate the indexes would save time but would also cost on the update side. But the primary driver was to eliminate the potential table lock issues. Just stopping the drops saved a few minutes as expected. But by converting to clustered columnstore also helped performance for all the processes that use the related fact tables. And no noticeable hit on the update side! (Testing proved the existing merge update superior to alternatives).
There were many detours during the upgrade especially around getting all the add-on components working with 2016 but I will save those for another day.
The end of the story finds us with a screaming system with faster reporting and super fast updates. And our freezing queries a distant memory.
If you are doing data warehousing with Sql Server and looking to upgrade, feel free to ping me with any questions.
You find it in the Vmware console in the “Manage” tab for VM Hardware. Open up the x zzzCPU section and you’ll see it. If it is checked, then this is bad…its real bad. If you have <= 8 cores then it confuses Sql Server threads. If you have > 8 cores, it also disables Numa awareness. Then you’ve gone from bad to worse.
The good news is, simply uncheck this option and reboot to fix it. And you can be the immediate hero – tell them it took many hours of hard work to increase the performance of their server.
I made this change and saw between an immediate 30% to 80% improvements in all my queries using parallelism. And the amount of cxpacket waits declined significantly. This was true both for my non-prod 8 core servers as well as the 14 core production server.
If you don’t believe me, believe Vmware…read their best practices guide at SqlServer on Vmware Best Practices Guide.
Or maybe not! Here’s how you can do it.
There are other older and alternative drivers but I found the setup easiest with the latest Oracle and Microsoft software. And the latest version of SSDT can be installed alongside older software as well as develop backwards compatible SSIS and SSRS. Oh and schema and data compare are now built-in! So why are you waiting to upgrade?
Install SSDTSetup.exe file (this is the new 2016 SSDT software that includes visual studio 2015) found at https://msdn.microsoft.com/en-us/library/mt204009.aspx
Install SetupODTforVS2015.exe file found at Oracle’s download site.
Copy your tnsnames.ora and sqlnet.ora files to c:\program files (x86)\oracle developer tools for vs2015\network\admin\
When you start SSDT or just visual studio, to go Tools – Connect to Database – Change the data source to Oracle Database with Data provider of ODP.NET – Next. The Data source name should reflect the databases in your tnsnames.ora file if its setup correction. Just sign in as normal. It will then appear in your server “Server Explorer” tool under Data Connections.
You should end up at this screen with the ability to scroll through your tnsnames aliases and the rest should be obvious…
Problem: Series of child-parent relationship rows where you need to find the topmost parent.
Script to create the situation below…
— Create a Table
drop table party_merge
CREATE TABLE Party_merge(
FromPartyID INT PRIMARY KEY,
— Insert Sample Data
— 3 gets gets 4 children (1, 2, 8, 9) assigned to merge to it and then itself gets merged to id=10
— 2 is the new parent for 4, 5, and 7
— 11 gets 12 assigned. Only parent 12 has only one child (11)
— 14 gets parent assigned as 14 (due to source system issue)
INSERT INTO Party_merge
SELECT 1, 3
SELECT 2, 3
SELECT 14, 14
SELECT 4, 2
SELECT 5, 2
SELECT 7, 2
SELECT 8, 3
SELECT 9, 3
SELECT 3, 10
SELECT 11, 12
— Check the data
ORDER BY FromPartyID, ToPartyID
SELECT p1.FromPartyID, p1.ToPartyId
FROM Party_merge p1
LEFT OUTER JOIN Party_merge p2
ON p1.ToPartyID = p2.FromPartyID
WHERE p1.FromPartyID <> p1.ToPartyID
This is a common error when doing null handling when replacing incoming value with something larger since the datatype defaults based on the incoming value.
You need to not just CAST or CONVERT to a different larger datatype but once but you have do it twice – both on the incoming column and then on the result. This can also happen when doing aggregations as well as it may demand overriding the default decimal or integer size for the resulting value.
Below demonstrates a further complication where CAST doesn’t work but CONVERT does…
Following doesn’t work…
SELECT CAST(ISNULL(CAST([Person_Acxiom_Abilitec_Id] AS VARCHAR(36)), NEWID()) AS VARCHAR(36)) AS Person_Acxiom_Abilitec_Id…
CAST(ISNULL(CONVERT(VARCHAR(36),[Person_Acxiom_Abilitec_Id]), NEWID()) AS VARCHAR(36)) AS Person_Acxiom_Abilitec_Id
The only difference is using CONVERT instead of CAST.