Sunday, January 27, 2013

Wildcard character based pattern search in T-SQL

Today happen to see a question based on wildcard characters search within a string pattern in a forum. I gave the solution for it and in my efforts found that the documentation given on this is not clear enough in MSDN. This prompted me to clear things up which I'm sharing in this blog.
The question asked was to develop a logic for searching for few wildcard characters [,],^,%,_  (ie )based pattern (for ex. [ ] ^ _ %) within a table column and return any rows that contain this pattern within the column.

Usual method to search for a set of characters is to enclose them within []. But here the tricky part is that the characters [ and ] themselves are among characters to be searched. Also _,^ etc are also used inside [] to denote wildcards as explained here.

The way to search for these characters is to escape them to enable the parser to not consider them as wildcard characters and include them as a part of the search string.
This can be done in two ways as explained below

Method 1: Enclosing the character within [] to escape.

See this example below


SELECT *
FROM
(
SELECT '[ ] ^ _ %' AS string_value UNION ALL
SELECT '[ ] ^ _ %anjhd' UNION ALL
SELECT 'jkhjkhnb[ ] ^ _ %' UNION ALL
SELECT 'hjjhgjhg[ ] ^ _ %anjhd' 
)t
WHERE patindex ('%[[ ]] ^ _ [%]', string_value) >0

output
--------------------------------------
string_value
--------------------------------------
[ ] ^ _ %
jkhjkhnb[ ] ^ _ %


As you see from above the characters to be searched for are enclosed within [] to regard them as search characters. The output clearly indicates that the parser regarded them as search characters and returned result with strings that contain the pattern [ ] ^ _ %

Method 2 Using the ESCAPE keyword 

This is another method that can be used with LIKE operator and enables a character to be defined as escape sequence. Then any wildcard character can be escaped inside pattern using this escape character as shown below.


SELECT *
FROM
(
SELECT '[ ] ^ _ %' AS string_value UNION ALL
SELECT '[ ] ^ _ %anjhd' UNION ALL
SELECT 'jkhjkhnb[ ] ^ _ %' UNION ALL
SELECT 'jkhjkhnb[ ][ ^ _ %' UNION ALL
SELECT 'hjjhgjhg[ ] ^ _ %anjhd' 
)t
WHERE string_value LIKE '%|[ ] ^ _ |%' ESCAPE '|'


output
-------------------------------
string_value
-------------------------------
[ ] ^ _ %
jkhjkhnb[ ] ^ _ %



In the above scenario character | is defined as the escape character and is used to precede each of the characters [,],^ etc to regard them as a search string and not as wildcard character. You can use any character as escape character by specifying them along with ESCAPE clause.
Hope this explains the methods to be used for searching patterns containing wildcard based characters. As always feel free to revert with any questions/clarification on the above. Thanks to the original poster sigmas as well for bringing up this question in the forum.

No comments:

Post a Comment