XML to the rescue – parsing text formulas

My last project (the one that prevented me from blogging in April) involved parsing some text formulas defined by business users. A typical formula would look like this: 1234.56+6789.100-5468. . Dots in this formula are not decimal separators – they separate accounts and subaccounts (financial stuff). Anyway, I had to extract those account/subaccount numbers, get their current balances and evaluate the formula. The most straightforward solution would be a loop, where the formula would be parsed – one character at a time.

But I have come up with another one – add some XML tags here and there, convert the formula to XML and extract relational data from it. Sounds good enough? Let’s see the code:

CREATE FUNCTION dbo.parse_formula(
	@p_formula VARCHAR(MAX)
)
RETURNS @tbl_formula_elements TABLE (
		SIGN CHAR(1),
		account VARCHAR(8),
		subaccount VARCHAR(20)
)
BEGIN
	DECLARE @formula VARCHAR(MAX) = CASE
		WHEN LEFT(@p_formula, 1)  '-'
			THEN '+'
		ELSE
			''
	END + @p_formula;
 
	SET @formula = REPLACE(
			@formula,
			'+', 
			'</Subaccount></FormulaElement><FormulaElement><Sign>+</Sign><Account>'
	);
 
	SET @formula = REPLACE(
		@formula,
		'-', 
		'</Subaccount></FormulaElement><FormulaElement><Sign>-</Sign><Account>'
	);
 
	SET @formula = REPLACE(
		@formula,
		'.', 
		'</Account><Subaccount>'
	);
 
	DECLARE @formula_elements XML = CONVERT(
		XML,
		'<Formula>' 
			+ RIGHT(@formula, LEN(@formula) - LEN('</Subaccount></FormulaElement>'))
			+ '</Subaccount></FormulaElement></Formula>'
	);
 
	INSERT @tbl_formula_elements
	SELECT 
		FormulaElements.cols.VALUE('Sign[1]', 'char(1)'),
		FormulaElements.cols.VALUE('Account[1]', 'varchar(8)'),
		FormulaElements.cols.VALUE('Subaccount[1]', 'varchar(20)')
	FROM
		@formula_elements.nodes('/Formula/FormulaElement') FormulaElements (cols);
 
	RETURN;
END;

This TVF first checks whether the formula begins with a - and puts a + at the start if it does not. Then it adds some XML tags so that the resulting XML consists of a single Formula element, which consists of one or many FormulaElement elements. Each FormulaElement contains a single Sign element, a single Account element and a single Subaccount element.

When the XML string is ready, the TVF converts it to the native XML type and puts the formula elements in the result table. The formula is ready for evaluation. Let's invoke the TVF:

SELECT 
	SIGN,
	account,
	subaccount
FROM
	dbo.parse_formula('1234.56+6789.100-5468.')
sign account  subaccount
---- -------- --------------------
+    1234     56
+    6789     100
-    5468
(3 row(s) affected)

How to evaluate a formula transformed to a table like this one? Join this table with the current account balances, multiply them by -1 where the sign is - and sum the results.

Those of you who fear XML - now you have another lethal weapon in your SQL Server arsenal (probably FOR XML PATH('') was the first one ;) ).