home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.databases.oracle      Overblown overpriced overengineered SHIT      2,288 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 2,102 of 2,288   
   Frank van Bortel to Server Applications   
   Re: Oracle Text: Indexing UTF8 or UTF16   
   19 May 05 10:00:46   
   
   From: frank.van.bortel@gmail.com   
      
   Server Applications wrote:   
   > Hello   
   >   
   > I am trying to build a system where I can full-text index documents with   
   > UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a   
   > third-party component outside the database, so the I dont need filtering in   
   > Oracle, but only indexing.   
   > If I put file references to the filtered files in the database and index   
   > these (using FILE_DATASTORE), everything works fine. But I rather put the   
   > filtered data in the database, and index it from here (using the   
   > PROCECURE_FILTER). But this gives me some problems when the data is actually   
   > unicode data.   
   > The interface for the procedure in the PROCEDURE_FILTER does not allow the   
   > data to be output as NCLOB or NVARCHAR, but only CLOB or VARCHAR. Indexing   
   > the data directly in the table (using eg. an NULL_FILTER or CHARSET_FILTER)   
   > have the same impact. If I try to index a column of the type NCLOB or   
   > NVARCHAR, the index-creation gives me an error telling me that it is an   
   > invalid column-type.   
   >   
   > I have tried to create a database with the UTF8 character set, expecting   
   > that the CLOB column type then could contain the UTF8 data, and that the   
   > indexing then would recognize the unicode characters in the data. This does   
   > not give any errors, but none of the unicode string in the data are   
   > contained in the index, only the strings in english (or ascii, strings with   
   > characters all within 1 byte) are contained in the index afterwards.   
   >   
   > Is is not possible to index data directly in a column (using either   
   > CHARSET_FILTER, NULL_FILTER or PROCEDURE_FILTER) that is in UTF8 or UTF16   
   > format?   
   >   
   >   
   > Thanks in advance for any comments.   
   >   
   > /David   
   >   
   >   
   This ng is dead - repost in cdo.server   
      
   --   
   Regards,   
   Frank van Bortel   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca