Tuesday, October 17, 2006

SQL Server: How to do a fuzzy search

I was asked yesterday to do a fuzzy search in SQL Server. The request was to find some numbers in a field that fell within a certain tolerance.

I worked out how to do this pretty simply using a BETWEEN clause with a fuzziness factor to go with it. In a simplified example, I'm looking for all employees within 3 years of 30. I parameterized both the age and the fuzz factor to make changes to this simple:




-- Here I just create a temporary table for the example.
create table #employee (employee varchar(32), age int)
insert into #employee values ('John', 32)
insert into #employee values ('Mary', 24)
insert into #employee values ('Sam', 40)
insert into #employee values ('Sarah', 28)
insert into #employee values ('Charles', 29)
insert into #employee values ('Henry', 31)

-- Here I declare variables to hold the age to look for
-- and the "fuzziness" of the search
declare @fuzz int
set @fuzz = 3
declare @seek_age int
set @seek_age = 30

-- This is the actual search
select employee, age
from #employee e
where e.age between @seek_age - @fuzz and @seek_age + @fuzz








EmployeeAge
John32
Sarah28
Charles29
Henry31


Increasing the @fuzz value increases the possibility of finding records.

In the actual case, I used the BETWEEN clause as the joining clause in a left join to fuzzily join records. This looks pretty scary but works. The following example does this to create a list of employees matched to other employees within 3 years of the same age:




-- Find other employees within three years of an employee's age
declare @fuzz int
set @fuzz = 3

select test.employee test_employee, test.age test_age,
fuzzy.employee fuzzy_employee, fuzzy.age fuzzy_age
from #employee test
left join #employee fuzzy
on test.age between fuzzy.age - @fuzz and fuzzy.age + @fuzz
-- here we don't want an employee to match him or herself
and test.employee <> fuzzy.employee
















test_employeetest_agefuzzy_employeefuzzy_age
John32Charles29
John32Henry31
Mary24NULLNULL
Sam40NULLNULL
Sarah28Charles29
Sarah28Henry31
Charles29John32
Charles29Sarah28
Charles29Henry31
Henry31John32
Henry31Sarah28
Henry31Charles29


A NULL result mean that that employee has no co-workers with 3 years of their age. Again, the @fuzz factor can be increased to find more matches.

No comments:

Post a Comment

I moderate comments blog posts over 14 days old. This keeps a lot of spam away. I generally am all right about moderating. Thanks for understanding.