Thursday, March 29, 2012

Does SQL2005 still require all FTI keywords to match in the same column?

Hello,
My understanding of full-text index searching is that CONTAINS()
requires that all the keywords supplied (separated by AND) must match
in the same column in order for the record to be considered a match and
returned. So if we have a situation where we need keywords to match in
multiple columns, we either need to use multiple CONTAINS() statements,
use FREETEXT() and allow it to manipulate our keywords, or create a
single column containing all the text we want indexed for a row and
search that unified column.
Is this correct? If so, is this shortcoming fixed in SQL2005?
Thanks,
Thomas
Hi Thomas,
I am afraid SQL 2005 shows the same behavior. FTS is designed to search for text/documents, which normally are stored in the same column. Can you please explain a bit more why you need to have queries that involve several word matches for several columns?
Thanks!
Fernando Azpeitia Lopez,
Program Manager
SQL Server FTS team
--Original Message--
From: Thomas
Posted At: Wednesday, February 15, 2006 9:33 AM
Posted To: microsoft.public.sqlserver.fulltext
Conversation: Does SQL2005 still require all FTI keywords to match in the same column?
Subject: Does SQL2005 still require all FTI keywords to match in the same column?
Hello,
My understanding of full-text index searching is that CONTAINS()
requires that all the keywords supplied (separated by AND) must match
in the same column in order for the record to be considered a match and
returned. So if we have a situation where we need keywords to match in
multiple columns, we either need to use multiple CONTAINS() statements,
use FREETEXT() and allow it to manipulate our keywords, or create a
single column containing all the text we want indexed for a row and
search that unified column.
Is this correct? If so, is this shortcoming fixed in SQL2005?
Thanks,
Thomas
|||Hi Fernando,
I'm not sure exactly how to respond... SQL Server excels at storing all
kinds of data. Our records are hybrids of large text fields, numbers,
dates, etc. We have multiple text columns. Just as one example, let's
say we have an email database where we have the headers stored in
different fields, at the very least, subject and body in different
fields. A person enters multiple keywords, and even if some are in the
subject and some are in the body, the record should match. That's just
a basic example.
We were really blown away when we found out this does not behave in
this fashion. It makes no sense, from our perspective as a user, to
query * (all columns) for some keywords, and require they all match in
only a single column.
Another problem we are fighting that lends itself to being allowed to
find the keywords across columns: tagging. We have found that we can
really speed up our searches if the entire search is performed on the
FTS side of the equation.
Contrived Example: You want to find all records that contain the word
"house" and were created in September 2005. Your table holds 100,000
records. Let's say 50,000 of those records contain the word "house."
But only 1000 were created in September 2005. You could do a search
like this:
SELECT * FROM ourtable WHERE CONTAINS(*, '"house"') AND createdate
BETWEEN '09/01/2005 00:00:00' AND '09/30/2005 23:59:59'
But we have found this search is really slow. The slowdown is in the
number of records being returned that contain "house" even though most
of them are not going to pass our SQL Server filter of the date. So we
want to create a tag column of textual things that we can search on the
FTS. Then our query would be:
SELECT * FROM ourtable WHERE CONTAINS(*, '"house" and "DT200509"')
Now we've shifted the date requirement over to the FTS side of the
search. Admittedly, a hack, but it should work. As you know, it
doesn't, because the two keywords will be found in two different
columns.
So what we are forced to do now is the ultimate hack: create a single,
new text field where we are duplicating all of our data from multiple
columns, adding our "tags" for constraining the searches, and then
full-text indexing this single column in order to get fast searches.
It seems weird that FTS is designed to search documents when SQL Server
is designed to hold all kinds of data. Shouldn't FTS be designed to
efficiently search what SQL Server can hold?
Thanks for your time.
Regards,
Thomas
|||Hi Thomas,
I see your problem. Let me think about the best solutions.
Most of FTS users are focus in get great functionality to efficiently search inside a document, rather than to search parts in different documents stored in different columns. Anyway, for these cases like yours, we support multiple CONTAINS. Is true that
the performance is not as good as with one single CONTAINS but if we would allow from the beginning to have several columns look ups in a single CONTAINS, we would probably finish with similar performance even if you are just writing one clause.
The good news is the following.
-For next FTS release we have several architecture improvements that will improve dramatically the joined queries. This means that mix relational (date for instance) with FTS search will be efficient. Following your example, before look the FTS side, the
optimizer will get the few ones that pass the date filter and then these ones will be FT searched.
This improvement also will improve multiple CONTAINS queries, so you will not longer experiment pain.
-For now, the best you can do is to use computed columns. These columns will contain virtually the same data than the original columns and you can create a FT index on that column. The indexing time will take longer as you are merging 2 or more columns bu
t at query time you will be able to query efficiently and find what you look for.
Does this help?
Regards,
Fernando Azpeitia Lopez,
Program Manager
SQL Server FTS team
--Original Message--
From: Thomas
Posted At: Wednesday, February 15, 2006 9:34 PM
Posted To: microsoft.public.sqlserver.fulltext
Conversation: Does SQL2005 still require all FTI keywords to match in the same column?
Subject: Re: Does SQL2005 still require all FTI keywords to match in the same column?
Hi Fernando,
I'm not sure exactly how to respond... SQL Server excels at storing all
kinds of data. Our records are hybrids of large text fields, numbers,
dates, etc. We have multiple text columns. Just as one example, let's
say we have an email database where we have the headers stored in
different fields, at the very least, subject and body in different
fields. A person enters multiple keywords, and even if some are in the
subject and some are in the body, the record should match. That's just
a basic example.
We were really blown away when we found out this does not behave in
this fashion. It makes no sense, from our perspective as a user, to
query * (all columns) for some keywords, and require they all match in
only a single column.
Another problem we are fighting that lends itself to being allowed to
find the keywords across columns: tagging. We have found that we can
really speed up our searches if the entire search is performed on the
FTS side of the equation.
Contrived Example: You want to find all records that contain the word
"house" and were created in September 2005. Your table holds 100,000
records. Let's say 50,000 of those records contain the word "house."
But only 1000 were created in September 2005. You could do a search
like this:
SELECT * FROM ourtable WHERE CONTAINS(*, '"house"') AND createdate
BETWEEN '09/01/2005 00:00:00' AND '09/30/2005 23:59:59'
But we have found this search is really slow. The slowdown is in the
number of records being returned that contain "house" even though most
of them are not going to pass our SQL Server filter of the date. So we
want to create a tag column of textual things that we can search on the
FTS. Then our query would be:
SELECT * FROM ourtable WHERE CONTAINS(*, '"house" and "DT200509"')
Now we've shifted the date requirement over to the FTS side of the
search. Admittedly, a hack, but it should work. As you know, it
doesn't, because the two keywords will be found in two different
columns.
So what we are forced to do now is the ultimate hack: create a single,
new text field where we are duplicating all of our data from multiple
columns, adding our "tags" for constraining the searches, and then
full-text indexing this single column in order to get fast searches.
It seems weird that FTS is designed to search documents when SQL Server
is designed to hold all kinds of data. Shouldn't FTS be designed to
efficiently search what SQL Server can hold?
Thanks for your time.
Regards,
Thomas
|||Fernando Azpeitia Lopez wrote:

> The good news is the following.
> -For next FTS release we have several architecture improvements that will improve
>dramatically the joined queries. This means that mix relational (date for instance) with
>FTS search will be efficient. Following your example, before look the FTS side, the
>optimizer will get the few ones that pass the date filter and then these ones will be FT
>searched. This improvement also will improve multiple CONTAINS queries, so you will
>not longer experiment pain.
When you say the "next FTS release," does this mean an upgrade to SQL
2005's FTS, or do you mean the FTS that is released in whatever version
comes after SQL Server 2005 (i.e. SQL Server 2010 ;-)

> -For now, the best you can do is to use computed columns. These columns will
>contain virtually the same data than the original columns and you can create a FT index
> on that column. The indexing time will take longer as you are merging 2 or more
>columns but at query time you will be able to query efficiently and find what you look
>for.
Can you give me an example of how to do a FTI on a computed column? We
are currently using SQL Server 2000, and only preparing our move to
2005, so are not yet familiar with 2005 completely.
Thanks,
Thomas
|||Hi Thomas,
When I say next FTS release I mean the next release, not any upgrade. And don’t worry, the next version should be no longer than 2007
In a following post I will let you know the steps to work with computed columns in SQL 2005.
Regards,
Fernando Azpeitia Lopez,
Program Manager
SQL Server FTS team
--Original Message--
From: Thomas [mailto:tomwinzig@.gmail.com]
Posted At: Friday, February 17, 2006 10:50 AM
Posted To: microsoft.public.sqlserver.fulltext
Conversation: Does SQL2005 still require all FTI keywords to match in the same column?
Subject: Re: Does SQL2005 still require all FTI keywords to match in the same column?
Fernando Azpeitia Lopez wrote:

> The good news is the following.
> -For next FTS release we have several architecture improvements that will improve
>dramatically the joined queries. This means that mix relational (date for instance) with
>FTS search will be efficient. Following your example, before look the FTS side, the
>optimizer will get the few ones that pass the date filter and then these ones will be FT
>searched. This improvement also will improve multiple CONTAINS queries, so you will
>not longer experiment pain.
When you say the "next FTS release," does this mean an upgrade to SQL
2005's FTS, or do you mean the FTS that is released in whatever version
comes after SQL Server 2005 (i.e. SQL Server 2010 ;-)

> -For now, the best you can do is to use computed columns. These columns will
>contain virtually the same data than the original columns and you can create a FT index
> on that column. The indexing time will take longer as you are merging 2 or more
>columns but at query time you will be able to query efficiently and find what you look
>for.
Can you give me an example of how to do a FTI on a computed column? We
are currently using SQL Server 2000, and only preparing our move to
2005, so are not yet familiar with 2005 completely.
Thanks,
Thomas

No comments:

Post a Comment