XML to the rescue – parsing text formulas

My last project (the one that prevented me from blogging in April) involved parsing some text formulas defined by business users. A typical formula would look like this: 1234.56+6789.100-5468. . Dots in this formula are not decimal separators – they separate accounts and subaccounts (financial stuff). Anyway, I had to extract those account/subaccount numbers, get their current balances and evaluate the formula. The most straightforward solution would be a loop, where the formula would be parsed – one character at a time.

But I have come up with another one – add some XML tags here and there, convert the formula to XML and extract relational data from it. Sounds good enough? Let’s see the code:

CREATE FUNCTION dbo.parse_formula(
	@p_formula VARCHAR(MAX)
)
RETURNS @tbl_formula_elements TABLE (
		SIGN CHAR(1),
		account VARCHAR(8),
		subaccount VARCHAR(20)
)
BEGIN
	DECLARE @formula VARCHAR(MAX) = CASE
		WHEN LEFT(@p_formula, 1)  '-'
			THEN '+'
		ELSE
			''
	END + @p_formula;
 
	SET @formula = REPLACE(
			@formula,
			'+', 
			'</Subaccount></FormulaElement><FormulaElement><Sign>+</Sign><Account>'
	);
 
	SET @formula = REPLACE(
		@formula,
		'-', 
		'</Subaccount></FormulaElement><FormulaElement><Sign>-</Sign><Account>'
	);
 
	SET @formula = REPLACE(
		@formula,
		'.', 
		'</Account><Subaccount>'
	);
 
	DECLARE @formula_elements XML = CONVERT(
		XML,
		'<Formula>' 
			+ RIGHT(@formula, LEN(@formula) - LEN('</Subaccount></FormulaElement>'))
			+ '</Subaccount></FormulaElement></Formula>'
	);
 
	INSERT @tbl_formula_elements
	SELECT 
		FormulaElements.cols.VALUE('Sign[1]', 'char(1)'),
		FormulaElements.cols.VALUE('Account[1]', 'varchar(8)'),
		FormulaElements.cols.VALUE('Subaccount[1]', 'varchar(20)')
	FROM
		@formula_elements.nodes('/Formula/FormulaElement') FormulaElements (cols);
 
	RETURN;
END;

This TVF first checks whether the formula begins with a - and puts a + at the start if it does not. Then it adds some XML tags so that the resulting XML consists of a single Formula element, which consists of one or many FormulaElement elements. Each FormulaElement contains a single Sign element, a single Account element and a single Subaccount element.

When the XML string is ready, the TVF converts it to the native XML type and puts the formula elements in the result table. The formula is ready for evaluation. Let's invoke the TVF:

SELECT 
	SIGN,
	account,
	subaccount
FROM
	dbo.parse_formula('1234.56+6789.100-5468.')
sign account  subaccount
---- -------- --------------------
+    1234     56
+    6789     100
-    5468
(3 row(s) affected)

How to evaluate a formula transformed to a table like this one? Join this table with the current account balances, multiply them by -1 where the sign is - and sum the results.

Those of you who fear XML - now you have another lethal weapon in your SQL Server arsenal (probably FOR XML PATH('') was the first one ;) ).

4Developers 2010 – my source code samples

4Developers 2010 is over, so it’s time to share my source code samples used during the presentation. You can download them here. The messages and labels are in Polish, but the whole source code uses English names, so hopefully non-Polish speakers will also find their way through the code ;)

4Developers 2010

Tomorrow I will be speaking at the 4Developers Conference in Poznan. I will be talking about using SSIS, SSAS and SSRS in .NET applications to solve various business problems with minimum effort. Lots of interesting content, only 45 minutes AND it’s the last session before lunch time – now that’s what I call a challenge :)

I will upload my source code samples shortly after the conference. I’m also planning some blog posts on this topic – hopefully quite soon. It won’t be easy, because currently I have some really time-consuming projects on the run, but stay tuned.

T-SQL Tuesday #002: Houston, we’ve had a non-breaking space here

This month I’m participating in T-SQL Tuesday for the first time. T-SQL Tuesday is a really great idea and I hope to participate regularly.

The topic for this month is a puzzling situation. My puzzle is loosely related to T-SQL, but that’s where I came across it. A few months ago I was working on a data integration project that involved extracting data from many different Excel files and loading a SQL Server database. The data format was far from perfect, so I had to load those Excel files into some staging tables and run some T-SQL code before loading the destination database.

For example, all phone numbers had to consist of 10 digits (with a leading zero). All hyphens, spaces, braces had to be removed. Quite an easy task – just a few REPLACEs and I could move on. But I could not – some of the updated records still had spaces in the phone numbers. I did a triple check of my REPLACE functions and I was sure that the code should work fine.

Then I came up with a new idea: I copied the character that looked like a space from the query results and executed:

SELECT ASCII(' ')

The result was: 160. Not 32. So the space wasn’t just an ordinary space. It turned out to be a non-breaking space. Seems obvious, but how many of us know that the ASCII table contains a non-breaking space?

It is also worth noticing that the TRIM function does not remove non-breaking spaces. Surprising, isn’t it?

Another puzzling situation I had to deal with was a case where a stored procedure randomly timed out when executed from a custom application, but when executed from SSMS – it always completed in just a few seconds. I blogged about it last month.

Hey, the stored procedure times out in my application, but it completes in a few seconds when executed in SSMS!

One of our customers had problems with his custom application. One of the stored procedures caused random timeouts in this application, but in SSMS the stored procedure always completed in just a few seconds. After attaching SQL Server Profiler it turned out that different execution plans were used for SSMS and the application, but the parameter values were exactly the same. The customer had already checked that recompiling the stored procedure or restarting the whole server did help for a while.

There are two cases to consider here:

  1. The stored procedure is causing timeouts, so it should probably be optimized
  2. The stored procedure always executes quickly in SSMS, so is doesn’t seem to be an optimization problem

Let’s illustrate that with a quick example. For the purpose of this post I’ve prepared a stored procedure that retrieves some data about the products and sales orders from AdventureWorks2008:

USE AdventureWorks2008;
GO
CREATE PROCEDURE dbo.GetOrderedProducts(
	@ProductName dbo.Name,
	@ProductNumber NVARCHAR(25),
	@ProductDaysToManufacture INT,
	@ProductCategoryName dbo.Name,
	@ProductSubcategoryName dbo.Name,
	@OnlineOrderFlag dbo.Flag
)
AS
BEGIN
	SELECT
		ProductName = p.Name, 
		p.ProductNumber,
		p.DaysToManufacture,
		ProductSubcategoryName = ps.Name,
		ProductCategoryName = pc.Name,
		sod.CarrierTrackingNumber,
		sod.OrderQty,
		sod.UnitPrice,
		soh.OrderDate,
		soh.DueDate,
		soh.ShipDate,
		soh.OnlineOrderFlag,
		soh.SalesOrderNumber,
		soh.SubTotal,
		soh.TaxAmt,
		soh.Freight
	FROM
		Sales.SalesOrderHeader soh
		INNER JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
		INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
		INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
		INNER JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
	WHERE
		((p.Name LIKE @ProductName) OR (@ProductName IS NULL))
		AND ((p.ProductNumber LIKE @ProductNumber) OR (@ProductNumber IS NULL))
		AND ((p.DaysToManufacture = @ProductDaysToManufacture) OR (@ProductDaysToManufacture IS NULL))
		AND ((pc.Name LIKE @ProductCategoryName) OR (@ProductCategoryName IS NULL))
		AND ((ps.Name LIKE @ProductSubcategoryName) OR (@ProductSubcategoryName IS NULL))
		AND ((soh.OnlineOrderFlag = @OnlineOrderFlag) OR (@OnlineOrderFlag IS NULL));
END;

Now, let’s execute this procedure for a particular product and limit the orders to those that were not made online:

USE AdventureWorks2008;
GO
EXEC dbo.GetOrderedProducts NULL, 'BK-T44U-46', NULL, 'Bikes', 'Touring Bikes', 0;

That execution will output just a fraction (186 rows) of the whole result set (121317 rows). Our next step – let’s retrieve the whole result set:

USE AdventureWorks2008;
GO
EXEC dbo.GetOrderedProducts NULL, NULL, NULL, NULL, NULL, NULL;

It took about 10 seconds to retrieve the whole result set on my laptop. Normally it should take about 3 seconds. What’s going on? Actually, this happens because of parameter sniffing.

When the stored procedure is compiled (or recompiled), the query optimizer tries to generate an efficient query plan for the current parameter values (and that’s parameter sniffing – you can read more about it and other related issues in an excellent TechNet article). That query plan is cached and subsequent executions of this procedure will likely try to use the same query plan (but there are cases where another query plan will be generated and used – procedure recompilation, for example).

Sounds good? Well, not exactly. That specific stored procedure will surely receive varying parameter values, so the initial query plan might not be the best one for other parameter values. We can quickly see the actual cached execution plan being used by querying some DMVs:

USE master;
GO
SELECT 
	UseCounts,
	RefCounts, 
	Cacheobjtype, 
	Objtype, 
	DatabaseName = ISNULL(DB_NAME(dest.dbid), 'ResourceDB'), 
	[SQL] = [TEXT],
	QueryPlan = query_plan 
FROM sys.dm_exec_cached_plans 
	CROSS APPLY sys.dm_exec_sql_text(plan_handle) dest
	CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE 
	DB_NAME(dest.dbid) = 'AdventureWorks2008'
ORDER BY 
	UseCounts DESC;

Querying dm_exec_cached_plans will show us a list of all cached execution plans. After joining the results with dm_exec_sql_text and dm_exec_query_plan we receive additional valuable information – the SQL batch text and the graphical query plan.

When we display the query plan that is currently cached, we will notice a Nested Loops operation with an Estimated Number of Rows = 456. That is an efficient operation when the table being iterated is small and the table being scanned is properly indexed. That might be true for the first execution of our procedure, but not for the second, when we want to retrieve the whole result set.

Now, let’s mark this procedure for recompilation:

USE AdventureWorks2008;
GO
EXEC SP_RECOMPILE N'dbo.GetOrderedProducts';

And execute the procedure again:

USE AdventureWorks2008;
GO
EXEC dbo.GetOrderedProducts NULL, NULL, NULL, NULL, NULL, NULL;

Now the execution completes in about 3 seconds. After querying the DMVs again we will find out that the execution plan has changed – instead of a Nested Loops operation we now have a Hash Match operation with an Estimated Number of Rows = 39909. If you need a more detailed analysis of both query plans, another article (by Grant Fritchey) might come in handy.

How to deal with similar issues related to parameter sniffing? The recommended solution is to add the WITH RECOMPILE clause to our stored procedure. We won’t take advantage of caching query plans, but at least the query optimizer will try to generate an efficient execution plan for each set of parameter values that is supplied to the procedure.

What about the problem mentioned in the post title? Well, one of the possible reasons for recompilation is a change in the SET options. Your application connects to SQL Server by using an appropriate provider (OLE DB, ODBC or others). That provider sets six database options by default:

SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET CONCAT_NULL_YIELDS_NULL ON
SET NUMERIC_ROUNDABORT OFF
SET QUOTED_IDENTIFIER ON

But there is another option (ARITHABORT) which is not set by default. The setting to use is determined from the default database options (for AdventureWorks2008 it will be ON by default, but for a new database it is OFF by default). That might result in a different ARITHABORT setting in your application and in SSMS, because in SSMS ARITHABORT is ON by default for each query window (Tools->Options->Query Execution->SQL Server->Advanced):

ssms_arithabort

When those settings differ between your app and SSMS, your stored procedure could time out in the application and execute quickly in SSMS (or vice versa). When you check out the DMVs, two plans for this procedure will be cached – one for each ARITHABORT setting. And that explains the difference in execution times.

Windows 7/Vista and Reporting Services 2008 – HTTP 404 or no connection

This issue might occur when Reporting Services is using the default port (80 or 443) to run Report Server and Report Manager on your workstation. I’ve reproduced this issue on Windows 7 and Vista, but Windows XP is probably also affected by the issue.

What’s the big deal? You install SSRS on your workstation, everything goes fine, the service starts, but the URLs for Report Server and Report Manager just don’t work (either throwing a HTTP 404 or just refusing to connect). You might also encounter the issue a few days (weeks/months etc.) after the initial installation.

We need to investigate the issue. The SSRS logs will come in handy (their default location is: C:\Program Files\Microsoft SQL Server\MSRS10.[INSTANCE_NAME]\Reporting Services\LogFiles). You should look for something similar to:

rshost!rshost!1120!12/12/2009-22:21:33:: e ERROR: Failed to register url=http://+:80/ReportServer/ for endpoint 2, error=20.
rshost!rshost!1120!12/12/2009-22:21:33:: w WARN: Endpoint 2 is enabled but no url is registered for vdir=/ReportServer, pdir=c:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportServer.
servicecontroller!DefaultDomain!1200!12/12/2009-22:21:33:: e ERROR: Error creating HTTP endpoint. System.IO.FileLoadException: The process cannot access the file because it is being used by another process. (Exception from HRESULT: 0x80070020)
   at Microsoft.ReportingServices.HostingInterfaces.IRsUnmanagedCallback.CreateHttpEndpoint(RsAppDomainType application, String[] urlPrefixes, Int32 cPrefixes, String virtualDirectory, String filePath, Int32 authType, Int32 logonMethod, String authDomain, String authRealm, Boolean authPersist, Boolean enabled)
   at Microsoft.ReportingServices.Library.ServiceAppDomainController.SetWebConfiguration(RunningApplication rsApplication, Boolean enabled, String folder)
rshost!rshost!ca4!12/12/2009-22:21:33:: e ERROR: Failed to register url=http://+:80/Reports/ for endpoint 3, error=20.
rshost!rshost!ca4!12/12/2009-22:21:33:: w WARN: Endpoint 3 is enabled but no url is registered for vdir=/Reports, pdir=c:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportManager.
servicecontroller!DefaultDomain!1200!12/12/2009-22:21:33:: e ERROR: Error creating HTTP endpoint. System.IO.FileLoadException: The process cannot access the file because it is being used by another process. (Exception from HRESULT: 0x80070020)
   at Microsoft.ReportingServices.HostingInterfaces.IRsUnmanagedCallback.CreateHttpEndpoint(RsAppDomainType application, String[] urlPrefixes, Int32 cPrefixes, String virtualDirectory, String filePath, Int32 authType, Int32 logonMethod, String authDomain, String authRealm, Boolean authPersist, Boolean enabled)
   at Microsoft.ReportingServices.Library.ServiceAppDomainController.SetWebConfiguration(RunningApplication rsApplication, Boolean enabled, String folder)

Changing the port in Reporting Services Configuration Manager to an unused port (like 8080, but that might not be true on your workstation) resolves the issue. So we have another service blocking the default port.

Now we need to find the culprit. A quick netstat will show the PID of the blocking process:

netstat -ano -p tcp
 
Active Connections
 
  Proto  Local Address          Foreign Address             State           PID
  ...
  TCP    0.0.0.0:80             0.0.0.0:0                   LISTENING       5340
  ...

Now we can find out more about the offending process (PID 5340 in this example). Task Manager can be used (after checking PID on View->Select Columns) or we can use tasklist here:

tasklist /FI "PID eq 5340"
 
Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
Skype.exe                     5340 Console                    1     66 488 K

Surprised? Well, I didn’t expect Skype here. But under Tools->Options->Advanced->Connection you can find the source of our problem:

This is the problematic option that blocks SSRS from creating HTTP endpoints

Clearing this option and restarting SSRS also resolves the whole issue. Even if it’s not Skype, the idea remains the same – close the offending process, change the port it’s listening on or change the ports for Report Server and Report Manager.

SharePoint 2010 standalone, SQL Server and no sysadmin privileges

Good news for developers – SharePoint 2010 can be installed on your workstations running Windows 7 or Vista x64. There is a good MSDN article describing the installation process – everything went smoothly on my Windows 7 Ultimate x64.

When WSS 3.0 or MOSS 2007 was installed in standalone mode, it used Windows Internal Database as its backend. For SharePoint 2010 this is no longer true – the installer puts up a named instance (with SHAREPOINT as the instance ID) of SQL Server 2008 Express. This is a fully functional Express instance, but there is still at least one gotcha – the user that installed SharePoint 2010 has no administrative privileges to this SQL Server instance. What if you really need those privileges?

Fortunately, there is a way (or maybe even multiple ways) to get the privileges you want. It’s fairly simple – SQL Server needs to be restarted in single-user mode, so that any member of the Windows Administrators group can login to SQL Server with sysadmin privileges. That administrator grants the required privileges to users, logs out and restarts SQL Server in normal mode.

I have written a simple batch script that accepts two parameters – the computer name and the user that needs to have sysadmin privileges. Here it is:

net stop mssql$sharepoint
net start mssql$sharepoint /m "sqlcmd"
sqlcmd -E -S %1\SHAREPOINT -Q "CREATE LOGIN [%1\%2] FROM WINDOWS; EXEC master..sp_addsrvrolemember @loginame = N'%1\%2', @rolename = N'sysadmin';"
net stop mssql$sharepoint
net start mssql$sharepoint
iisreset

For example, to grant privileges to user JanuszM working on a computer named ITCNG, the following syntax would do the trick (provided that the above script is saved to a file named sp2010_grant_sysadmin.bat):

sp2010_grant_sysadmin.bat ITCNG JanuszM

If you’re wondering why I’m using /m “sqlcmd” instead of just /m, the answer is simple – because it’s single-user mode and only one user can connect to SQL Server. Restricting the connections to sqlcmd almost guarantees that no other running service will acquire that connection and our user will be able to connect and perform the required grants.

SQL Server 2008 R2 November CTP available for download