I am trying to create a test case for one of my clients having issues with text index.
Text index comes with lot of options for language, lexer etc. It is not as simple as creating a normal index. Through my experience i have seen multiple issues with text indexes and one of which is with printjoins.
Details of Text index and its options can be found at,
http://download.oracle.com/docs/cd/B19306_01/text.102/b14218.pdf
and look at my other post on text index,
http://mithunashok.blogspot.com/2009/10/oracle-text-index-real-time-experience.html
User complains that when they search with . character it does not consider . as character. Look at the example below(tested on 10.2.0.1).
create table mith( n number primary key, l varchar2(60));
insert into mith values( 1, 'nisha.mithun');
insert into mith values( 2, 'nisha mithun');
insert into mith values( 3, 'nisha mithun nisha mithun');
insert into mith values( 4, 'hello-world');
insert into mith values( 5, 'hello@world');
insert into mith values( 6, 'hello world. world');
create index ctx_mith on mith(l)
indextype is ctxsys.context;
select * from mith where contains( l, 'nisha.mithun') > 0;
N L
---------- ------------------------------------------------------------
1 nisha.mithun
2 nisha mithun
3 nisha mithun nisha mithun
Ideally it should have returned only nisha.mithun but it has returned all the occurences of nisha mithun. Lets look at the index descrition. You can use the following sql to describe an index.
SELECT CTX_REPORT.DESCRIBE_INDEX ('CTX_MITH1') FROM DUAL;
===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "TESTINDEX"."CTX_MITH"
index id: 1060
index type: context
base table: "TESTINDEX"."MITH"
primary key column: N
text column: L
text column type: VARCHAR2(60)
CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------
language column:
format column:
charset column:
===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE
filter: NULL_FILTER
CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------
section group: NULL_SECTION_GROUP
lexer: BASIC_LEXER
wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC
stoplist: BASIC_STOPLIST
stop_word: Mr
Look at group lexer which is basic_lexer by default and worklist is fuzzy_match by default.
Now lets change few options and check if it returns correct search.
EXEC Ctx_Ddl.drop_preference('SSUSE');
EXEC Ctx_Ddl.Create_Preference('SSUSE', 'BASIC_WORDLIST');
EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/');
CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER');
SQL> select * from mith where contains( l, 'nisha.mithun') > 0;
N L
---------- ------------------------------------------------------------
1 nisha.mithun
Now it returns result as expected. By default . is treated as an operator and not as a text element. If you want to use operators as text elements then use printjoins option with basix_lexer.
Now since this works, let me insert one more row,
SQL> insert into mith values( 7, 'nisha. mithun');
1 row created.
SQL> select * from mith where contains( l, 'nisha. mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
Suprising!
Not suprising enough. Same rule applies even for whitespace as is for '.' operator. Look at the following query,
SQL> select * from mith where contains( l, 'nisha mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun
SQL> select * from mith where contains( l, 'nisha mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun
For this include ' ' in printjoins and see the result below.
EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/ ');
drop index ctx_mith1;
CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER SYNC (ON COMMIT)');
SQL> select * from mith where contains( l, 'nisha.\ mithun') > 0;
N L
---------- ------------------------------------------------------------
7 nisha. mithun
Text index comes with lot of options for language, lexer etc. It is not as simple as creating a normal index. Through my experience i have seen multiple issues with text indexes and one of which is with printjoins.
Details of Text index and its options can be found at,
http://download.oracle.com/docs/cd/B19306_01/text.102/b14218.pdf
and look at my other post on text index,
http://mithunashok.blogspot.com/2009/10/oracle-text-index-real-time-experience.html
User complains that when they search with . character it does not consider . as character. Look at the example below(tested on 10.2.0.1).
create table mith( n number primary key, l varchar2(60));
insert into mith values( 1, 'nisha.mithun');
insert into mith values( 2, 'nisha mithun');
insert into mith values( 3, 'nisha mithun nisha mithun');
insert into mith values( 4, 'hello-world');
insert into mith values( 5, 'hello@world');
insert into mith values( 6, 'hello world. world');
create index ctx_mith on mith(l)
indextype is ctxsys.context;
select * from mith where contains( l, 'nisha.mithun') > 0;
N L
---------- ------------------------------------------------------------
1 nisha.mithun
2 nisha mithun
3 nisha mithun nisha mithun
Ideally it should have returned only nisha.mithun but it has returned all the occurences of nisha mithun. Lets look at the index descrition. You can use the following sql to describe an index.
SELECT CTX_REPORT.DESCRIBE_INDEX ('CTX_MITH1') FROM DUAL;
===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "TESTINDEX"."CTX_MITH"
index id: 1060
index type: context
base table: "TESTINDEX"."MITH"
primary key column: N
text column: L
text column type: VARCHAR2(60)
CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------
language column:
format column:
charset column:
===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE
filter: NULL_FILTER
CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------
section group: NULL_SECTION_GROUP
lexer: BASIC_LEXER
wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC
stoplist: BASIC_STOPLIST
stop_word: Mr
Look at group lexer which is basic_lexer by default and worklist is fuzzy_match by default.
Now lets change few options and check if it returns correct search.
EXEC Ctx_Ddl.drop_preference('SSUSE');
EXEC Ctx_Ddl.Create_Preference('SSUSE', 'BASIC_WORDLIST');
EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/');
CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER');
SQL> select * from mith where contains( l, 'nisha.mithun') > 0;
N L
---------- ------------------------------------------------------------
1 nisha.mithun
Now it returns result as expected. By default . is treated as an operator and not as a text element. If you want to use operators as text elements then use printjoins option with basix_lexer.
Now since this works, let me insert one more row,
SQL> insert into mith values( 7, 'nisha. mithun');
1 row created.
SQL> select * from mith where contains( l, 'nisha. mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
Suprising!
Not suprising enough. Same rule applies even for whitespace as is for '.' operator. Look at the following query,
SQL> select * from mith where contains( l, 'nisha mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun
SQL> select * from mith where contains( l, 'nisha mithun') > 0;
N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun
For this include ' ' in printjoins and see the result below.
EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/ ');
drop index ctx_mith1;
CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER SYNC (ON COMMIT)');
SQL> select * from mith where contains( l, 'nisha.\ mithun') > 0;
N L
---------- ------------------------------------------------------------
7 nisha. mithun
Comments