Friday, March 9, 2012

Does FTS in SQL 7 have known issues with not indexing records?

I've been testing using a full text index on a few columns in one of my
databases, and I'm having trouble with the index appearing to miss records.
I have stopped all updates on my database, and done a full population of the
SearchTitle field in my STK table. I have then waited until the full text
update has completed, and there are no errors in the event log. I then tried
the following queries:
SELECT STK.ID, STK.SearchTitle FROM STK WHERE
CONTAINS(STK.SearchTitle,'"being" and "jordan"')
Result is zero records. I also tried
SELECT STK.ID, STK.SearchTitle FROM STK WHERE
CONTAINS(STK.SearchTitle,'"being jordan"')
Again, zero records. I then tried
SELECT STK.ID, STK.SearchTitle FROM STK WHERE STK.SearchTitle LIKE '% being
%' and STK.SearchTitle LIKE '% jordan %')
and get 1 result, which is what I expect.
The SearchTitle field contains a stripped down version of book titles in my
table, all fields have a space followed by the words in the table followed
by an ending space (this is so that the current searches I do via the last
example work on whole words without ever finding partial matches). In the
above case the SearchTitle field contains just ' being jordan ' (without the
quotes).
SearchTitle is a varchar(255) field, and there are just under 365572 rows in
the table. The FT index shows 339632 items with 380112 unique words. I have
emptied the noise word files because they were causing problems with
searches, so I know it's not a noise word issue. This indicates that FTS has
skipped around 26000 records. I am currently running another full population
to see if the problem is a temporary one, but I was wondering if there are
known issues with FT indexing that I might be experiencing.
Dan
Being could be a noise word for the noise word list. Do you get the same
number of hits if you search on Jordan as you get if you search on Like '%
Jordan %'?
"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:urZxCI5rEHA.1204@.TK2MSFTNGP12.phx.gbl...
> I've been testing using a full text index on a few columns in one of my
> databases, and I'm having trouble with the index appearing to miss
> records.
> I have stopped all updates on my database, and done a full population of
> the
> SearchTitle field in my STK table. I have then waited until the full text
> update has completed, and there are no errors in the event log. I then
> tried
> the following queries:
> SELECT STK.ID, STK.SearchTitle FROM STK WHERE
> CONTAINS(STK.SearchTitle,'"being" and "jordan"')
> Result is zero records. I also tried
> SELECT STK.ID, STK.SearchTitle FROM STK WHERE
> CONTAINS(STK.SearchTitle,'"being jordan"')
> Again, zero records. I then tried
> SELECT STK.ID, STK.SearchTitle FROM STK WHERE STK.SearchTitle LIKE '%
> being
> %' and STK.SearchTitle LIKE '% jordan %')
> and get 1 result, which is what I expect.
> The SearchTitle field contains a stripped down version of book titles in
> my
> table, all fields have a space followed by the words in the table followed
> by an ending space (this is so that the current searches I do via the last
> example work on whole words without ever finding partial matches). In the
> above case the SearchTitle field contains just ' being jordan ' (without
> the
> quotes).
> SearchTitle is a varchar(255) field, and there are just under 365572 rows
> in
> the table. The FT index shows 339632 items with 380112 unique words. I
> have
> emptied the noise word files because they were causing problems with
> searches, so I know it's not a noise word issue. This indicates that FTS
> has
> skipped around 26000 records. I am currently running another full
> population
> to see if the problem is a temporary one, but I was wondering if there are
> known issues with FT indexing that I might be experiencing.
> Dan
>
|||"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:u71T0k6rEHA.3076@.TK2MSFTNGP10.phx.gbl...
> Being could be a noise word for the noise word list. Do you get the same
> number of hits if you search on Jordan as you get if you search on Like '%
> Jordan %'?
No, instead of getting the expected 58 titles, I get 54. I have cleared the
noise word list before generating the index - SQL was throwing out errors if
1 noise word was passed into the search even if there were other non-noise
words, so I decided that rather than parsing out the noise words and getting
in-exact matches for what customers enter in their searches I'd just index
everything.
Dan
|||After running a full population again it appears to have now indexed
everything. I'll be doing some more preliminary testing before putting this
live though, last thing I want is for customers not to be able to find items
in our database (the example I gave of Being Jordan was the top selling book
a few weeks ago, not having that listed in search results would have been
disastrous for us.
Dan
|||"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:%23jpG03CsEHA.3748@.TK2MSFTNGP09.phx.gbl...
> After running a full population again it appears to have now indexed
> everything. I'll be doing some more preliminary testing before putting
this
> live though, last thing I want is for customers not to be able to find
items
> in our database (the example I gave of Being Jordan was the top selling
book
> a few weeks ago, not having that listed in search results would have been
> disastrous for us.
Looking at the event logs I've noticed that despite the item count being
correct, in the information event at completion of an incremental update
there is:
"Detected end of incremental crawl for project <SQLServer SQL0002300005>.
Successfully processed 365614 documents, 0K. Failed to filter 13 documents.
Modified 282 documents."
Followed by a warning event with ID 3051 containing:
"Detected 13 URLs that could not be reached or were denied access in project
<SQLServer SQL0002300005>."
I assume that for some reason 13 items couldn't be accessed when running the
incremental search. I'm running another one now to see if I get the same
messages, as it's only taking around 20 mins to run the incremental compared
to 4 hours running the full population.
I've just looked back at the full population I ran yesterday, and have
noticed that it also logged a warning event (I have updated some records
since this was built, hence the difference in the item counts). Here's the
information one first:
"Detected end of crawl for project <SQLServer SQL0002300005>. Successfully
processed 365652 documents, 0K. Failed to filter 0 documents."
Followed by a warning event:
"Detected 365452 URLs that could not be reached or were denied access in
project <SQLServer SQL0002300005>."
Whereas the previous pair of errors made sense in that the information
message indicates that 13 records couldn't be indexed, and the warning seems
to confirm this, the pair for the full population are confusing in that they
don't match. Does this indicate a potential problem in the indexing system?
Or is the logging of mismatched item counts in the event log a normal
occurrence?
Dan
|||make sure your noise word list has a single space in it, otherwise it will
be using the noise word list found in %windir%\system32.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:urhcK2CsEHA.324@.TK2MSFTNGP11.phx.gbl...[vbcol=seagreen]
> "Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
> news:u71T0k6rEHA.3076@.TK2MSFTNGP10.phx.gbl...
'%
> No, instead of getting the expected 58 titles, I get 54. I have cleared
the
> noise word list before generating the index - SQL was throwing out errors
if
> 1 noise word was passed into the search even if there were other non-noise
> words, so I decided that rather than parsing out the noise words and
getting
> in-exact matches for what customers enter in their searches I'd just index
> everything.
> Dan
>
|||I strongly suggest you move to SQL 2000 for performance and scalability
reasons.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:%23jpG03CsEHA.3748@.TK2MSFTNGP09.phx.gbl...
> After running a full population again it appears to have now indexed
> everything. I'll be doing some more preliminary testing before putting
this
> live though, last thing I want is for customers not to be able to find
items
> in our database (the example I gave of Being Jordan was the top selling
book
> a few weeks ago, not having that listed in search results would have been
> disastrous for us.
> Dan
>
|||You can get these errors for a variety of reasons.
You get 0 rows could not be indexed typically for the below reasons.
1) the account SQL Server runs under is not registered with MSSearch. You
can get this when you change the SQL Server service account through control
panel instead of via Enterprise Manager. This will cause the entire table
not to be indexed.
2) verify that the login BUILTIN\Administrator exists in the login folder.
If it does not exist add it in.
You will get xxx rows could not be indexed typically for the below reasons
1) one or more rows were deleted since the last population
2) a row was locked
3) a row could contain a very large amount of data which could not be
extracted in the time MSSearch waits to extract each row.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:u4xryIDsEHA.1816@.TK2MSFTNGP15.phx.gbl...[vbcol=seagreen]
> "Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
> news:%23jpG03CsEHA.3748@.TK2MSFTNGP09.phx.gbl...
> this
> items
> book
been
> Looking at the event logs I've noticed that despite the item count being
> correct, in the information event at completion of an incremental update
> there is:
> "Detected end of incremental crawl for project <SQLServer SQL0002300005>.
> Successfully processed 365614 documents, 0K. Failed to filter 13
documents.
> Modified 282 documents."
> Followed by a warning event with ID 3051 containing:
> "Detected 13 URLs that could not be reached or were denied access in
project
> <SQLServer SQL0002300005>."
> I assume that for some reason 13 items couldn't be accessed when running
the
> incremental search. I'm running another one now to see if I get the same
> messages, as it's only taking around 20 mins to run the incremental
compared
> to 4 hours running the full population.
>
> I've just looked back at the full population I ran yesterday, and have
> noticed that it also logged a warning event (I have updated some records
> since this was built, hence the difference in the item counts). Here's the
> information one first:
> "Detected end of crawl for project <SQLServer SQL0002300005>. Successfully
> processed 365652 documents, 0K. Failed to filter 0 documents."
> Followed by a warning event:
> "Detected 365452 URLs that could not be reached or were denied access in
> project <SQLServer SQL0002300005>."
> Whereas the previous pair of errors made sense in that the information
> message indicates that 13 records couldn't be indexed, and the warning
seems
> to confirm this, the pair for the full population are confusing in that
they
> don't match. Does this indicate a potential problem in the indexing
system?
> Or is the logging of mismatched item counts in the event log a normal
> occurrence?
> Dan
>
|||"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:eitvdMFsEHA.2340@.TK2MSFTNGP11.phx.gbl...
> I strongly suggest you move to SQL 2000 for performance and scalability
> reasons.
Unfortunately this is not an option at present due to cost - I would need
not only the SQL Server 2000 license, but also 2 SQL Server processor
licenses (dual processor server) so that the database is licensed for use on
my web sites. Last time I looked that was a hefty sum.
Dan
|||"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:eNpYhLFsEHA.2732@.TK2MSFTNGP09.phx.gbl...
> make sure your noise word list has a single space in it, otherwise it will
> be using the noise word list found in %windir%\system32.
Yes, I did leave a single space in the noise word files after reading a few
posts in here about it.
Dan

No comments:

Post a Comment