Verity(R) K2 Toolkit V5.5.0 Patch 21 June 30, 2005 - TABLE OF CONTENTS ------------------- o PATCH DESCRIPTION o DELIVERABLES o SPECIAL INSTALLATION NOTES o PATCH INSTALLER INSTALLATION INSTRUCTIONS o INSTALLATION INSTRUCTIONS o SPECIAL NOTES o BUGS FIXED - PATCH DESCRIPTION ------------------- This patch contains Verity K2 Toolkit V5.5.0 Patch 21. This patch includes all previous patches and their fixes for the applicable platforms that are listed in the Deliverables section. This patch contains fixes for the following: Bug 96493: _CACHE_FN disappears from BIF after an indexer crash or job gracefully shutdown. _CACHE_FN disappears from BIF after an indexer crash or job gracefully shutdown. Bug 97616: Fix LOGSUM initialization code. Bug 98144: Port 96284 to 5.5 (field support for non-collection docsources). Bug 98220: Arabic characters are not indexed correctly when the meata tag is set to 1256. Bug 98619: Enable collection security administration thorugh VAdministration API. Bug 98628: Creation of PI from collection has inconsistent results. Bug 98663: Fixes problem where score is lost when using 8-bit precision and sortspec anything other than SCORE and k2server is set to vdkSorting="1". Bug 98754: When converting a Microsoft Word document to HTML, auto-numbering in section headings is not maintained. Bug 98832: K2ParaSearchNewArg.K2ParaSearchNew leaking file descriptor/socket. Bug 98856: K2Server will now always log collection down errors. Bug 98959: When filtering a Microsoft Excel file out of process on Linux a General Error (6) is generated. Bug 99007: When filtering a Microsoft Word document containing a table within a text box, the words in the text box are concatenated. Bug 99016: Update MINSIZE comment. Bug 99056: Add configurable replacement character for kvcs module. (See special note below.) Bug 99133: k2collswap brings collections down in both the k2servers. Bug 99150: The KeyView conversion of some Japanese characters from Shift_JIS to Unicode is not compatible with Microsoft's character set conversion. Bug 99198: Should now be able to create and properly find a search zone with accented characters. (See special note below.) Bug 99414: Multi-threaded submit bif in ODK fails to add all partitions to PDD. Bug 99578: Fixes the problem where the K2 server is locking and needs to be restarted to resume the search on a query error when the query was long. Bug 99627: A signed EML message is filtered as a text file and generates a very large output file. Bug 99859: Mistranslated "ANY" choice in the drop down menu of parametric in Japanese. Bug 100025: No results are retrieved when stopping k2server1 and starting k2server2. - DELIVERABLES -------------- All platform: /data/docs/WEB-INF/lib/verity.jar /data/docs/WEB-INF/lib/vsearch.jar /data/docs/verity_docs_webapp.war /env//bin/flt_kv(.dll/.a/.sl) /env//bin/vspider(.exe) /inxdata/common/inxdata/arabic-std.stemmer-cd /inxdata/common/inxdata/czech-std.stemmer-cd /inxdata/common/inxdata/greek-std.stemmer-cd /inxdata/common/inxdata/hebrew-std.stemmer-cd /inxdata/common/inxdata/hungarian-std.stemmer-cd /inxdata/common/inxdata/polish-std.stemmer-cd /inxdata/common/inxdata/romanian-std.stemmer-cd /inxdata/common/inxdata/russian-std.stemmer-cd /inxdata/common/inxdata/turkish-std.stemmer-cd /k2//bin/admdctm.so(.dll/.a/.sl) /k2//bin/admnotes.so(.dll/.a/.sl) /k2//bin/admnotes5.so(.dll/.a/.sl) /k2//bin/dctmevnt(.exe) /k2//bin/flt_kv(.dll/.a/.sl) /k2//bin/flt_kv.so(.dll/.a/.sl) /k2//bin/flt_lang.so(.dll/.a/.sl) /k2//bin/k2admin(.exe) /k2//bin/k2broker(.exe) /k2//bin/k2collswap(.exe) /k2//bin/k2index(.exe) /k2//bin/k2server(.exe) /k2//bin/k2spider(.exe) /k2//bin/k2spider_srv(.exe) /k2//bin/k2spider_srv.(exe) /k2//bin/ktmgr(.exe) /k2//bin/kvcs.so(.dll/.a/.sl) /k2//bin/libodk.so(.dll/.a/.sl) /k2//bin/libvdk30.so(.dll/.a/.sl) /k2//bin/libvi18n.so(.dll/.a/.sl) /k2//bin/loc_xlt.so(.dll/.a/.sl) /k2//bin/locbasis.so(.dll/.a/.sl) /k2//bin/locuni.so(.dll/.a/.sl) /k2//bin/mkpi(.exe) /k2//bin/mkre(.exe) /k2//bin/mksyd(.exe) /k2//bin/mktopics(.exe) /k2//bin/rcidx(.exe) /k2//bin/rck2(.exe) /k2//bin/rcodk(.exe) /k2//bin/secfsys.so(.dll/.a/.sl) /k2//bin/sechttp.so(.dll/.a/.sl) /k2//bin/vconfig(.exe) /k2//bin/vgnotes4.so(.dll/.a/.sl) /k2//bin/vgnotes5.so(.dll/.a/.sl) /k2//bin/vgwdctm.so(.dll/.a/.sl) /k2//bin/vgwfsys.so(.dll/.a/.sl) /k2//bin/vgwhttp.so(.dll/.a/.sl) /k2//bin/vgwmsxch.so(.dll/.a/.sl) /k2//bin/vgwnotes.so(.dll/.a/.sl) /k2//bin/vgwnotes4.so(.dll/.a/.sl) /k2//bin/vgwnotes5.so(.dll/.a/.sl) /k2//bin/vgwodbc.so(.dll/.a/.sl) /k2//bin/vic(.exe) /k2//bin/vsdb(.exe) /k2//bin/vspider(.exe) /k2//bin/vssl_exp.so(.dll/.a/.sl) /k2//bin/vssl_us.so(.dll/.a/.sl) /k2//bin/vzlib.so(.dll/.a/.sl) /k2//filter/k2vshtml(.exe) /k2//filters/afsr.so(.dll/.a/.sl) /k2//filters/chartbls.ux /k2//filters/chartbls.ux /k2//filters/htmcnv.so (.dll/.a/.sl) /k2//filters/htmcnv.so (.dll/.a/.sl) /k2//filters/htmsr.so (.dll/.a/.sl) /k2//filters/k2vshtml(.exe) /k2//filters/kpMSOrdr.so (.dll/.a/.sl) /k2//filters/kpemfrdr.so (.dll/.a/.sl) /k2//filters/kpifutil.so (.dll/.a/.sl) /k2//filters/kppdfrdr.so (.dll/,a/,sl) /k2//filters/kppdfrdr.so (.dll/.a/.sl) /k2//filters/kvfilter.so (.dll/.a/.sl) /k2//filters/kvhtml.so (.dll/.a/.sl) /k2//filters/kvutil.so (.dll/.a/.sl) /k2//filters/kvvector.class /k2//filters/kvvector.class.new /k2//filters/kvxml.so (.dll/.a/.sl) /k2//filters/kwad.so (.dll/.a/.sl) /k2//filters/misr.so (.dll/.a/.sl) /k2//filters/msgsr.so (.dll/.a/.sl) /k2//filters/mw6sr.so (.dll/.a/.sl) /k2//filters/mw6sr.so(.dll/.a/.sl) /k2//filters/mw8sr.so (.dll/.a/.sl) /k2//filters/mw8sr.so(.dll/.a/.sl) /k2//filters/mwssr.so(.dll/.a/.sl) /k2//filters/pdfsr.so (.dll/.a/.sl) /k2//filters/pdfsr.so(.dll/.a/.sl) /k2//filters/sosr.so (.dll/.a/.sl) /k2//filters/unzip.so (.dll/.a/.sl) /k2//filters/wp6sr.so (.dll/.a/.sl) /k2//filters/xlssr.so (.dll/.a/.sl) /k2//filters/xlssr.so(.dll/.a/.sl) /k2//lib/libadminclient.lib(.a) /k2//lib/verity.jar /k2//lib/vparametric.jar /k2//lib/vsearch.jar /k2/_nti40/filters/kvvector.jar /k2/_nti40/filters/kvvector.jar.new /k2/_nti40/lib/kvvector.jar /k2/_nti40/lib/mkreport.jar /k2/_nti40/lib/vadmin.jar /k2/_nti40/lib/verity.jar /k2/_nti40/lib/vindex.jar /k2/_nti40/lib/vparametric.jar /k2/_nti40/lib/vprofiler.jar /k2/_nti40/lib/vsearch.jar /k2/common/inxdata/arabic-std.stemmer-cd /k2/common/inxdata/bokmal-expanded.stemmer-cd /k2/common/inxdata/czech-std.stemmer-cd /k2/common/inxdata/greek-std.stemmer-cd /k2/common/inxdata/hebrew-std.stemmer-cd /k2/common/inxdata/hungarian-std.stemmer-cd /k2/common/inxdata/polish-std.stemmer-cd /k2/common/inxdata/romanian-std.stemmer-cd /k2/common/inxdata/russian-std.stemmer-cd /k2/common/inxdata/turkish-std.stemmer-cd /k2/common/langid/langlist.cfg /k2/common/patchinfo_k2tk.txt /k2/include/odk_tax.h /k2/include/vdk_api.h /k2/samples/web/applications/parametric/WEB-INF/lib/verity.jar /k2/samples/web/applications/parametric/WEB-INF/lib/vparametric.jar /k2/samples/web/applications/recommendation/WEB-INF/lib/verity.jar /k2/samples/web/applications/recommendation/WEB-INF/lib/vsearch.jar /k2/samples/web/applications/recommendation/recommendation.war /k2/samples/web/examples/WEB-INF/lib/verity.jar /k2/samples/web/examples/WEB-INF/lib/vsearch.jar /k2/samples/web/templates/asp/WEB-INF/lib/verity.jar /k2/samples/web/templates/asp/WEB-INF/lib/vparametric.jar /k2/samples/web/templates/asp/WEB-INF/lib/vsearch.jar /k2/samples/web/templates/jsp/WEB-INF/lib/verity.jar /k2/samples/web/templates/jsp/WEB-INF/lib/vparametric.jar /k2/samples/web/templates/jsp/WEB-INF/lib/vsearch.jar Windows platform: /k2/_nti40/bin/admnotes4.dll /k2/_nti40/bin/admnotes5.dll /k2/_nti40/bin/k2lmnt.dll /k2/_nti40/bin/vgwdctm.dll /k2/_nti40/bin/vgwodbc.dll /k2/_nti40/bin/vsearch.dll /k2/_nti40/filters/kpifcnvt.dll /k2/_nti40/filters/kpifutil.dll /k2/_nti40/filters/lwpsr.dll \k2\\bin\vsfcman.dll \k2\\filters\dunzip32.dll \k2\_nti40\filters\kvdocve.dll \k2\_nti40\filters\lwpsr.dll \k2\_nti40\filters\servant.exe Solaris platform: /k2//filters/kvfilter_posix.so HPUX platform: AIX platform: /k2//filters/kvfilter_nsl.a Linux platform: - SPECIAL INSTALLATION NOTES ---------------------------- Bug 92817: 'new line + space' will not be considered as a paragraph break. NOTE 1 ====== A new vspider option: -repos_charmap, vspider will use it to convert vdkvgwkey (i.e. the file path) into collection charmap. (91480) NOTE 2 ====== The Parametric Search Results do not do multibyte charset conversions correctly. Multibyte strings in PI search results will appear garbled. Bug 93904: You need to backup your vgwnotes.dll/.so etc and rename vgwnotes5.dll/.so to vgwnotes.dll/.so if you use notes client 5 and above, otherwise rename the vgwnotes4.dll/.so to vgwnotes.dll/.so Bug 94651: If you are using Notes Client 4.6.x, copy vgwnotes4.dll/so/a to vgwnotes.dll/so/a. Note: vgwnotes5.dll/so/a is the default and was renamed to vgwnotes.dll/so/a. Bug 95135: With the changes to libpi (Parametric Indexing module), new PIs (Parametric Indexes) created will not be compatible with older versions of k2server. k2server shipped with this patch release will be able to read PIs created using older versions of mkpi and k2index. However, if you create PIs using mkpi or k2index shipped with this patch. You must use k2server shipped with this patch to read it. Bug 97744: Rename admnotes5.dll to admnotes.dll for notes ver 5.x and above. Rename admnotes4.dll to admnotes.dll for notes 4.x version. - PATCH INSTALLER INSTALLATION INSTRUCTIONS ------------------------------------------- On Windows: 1. The machine where Verity is installed must have a JVM installed. If there is no JVM on the target matchine, please use the old style patches. 2. Download to the machine where the K2 Install is located. 3. Double click on the executable. 4. Follow the instructions in the interface. 5. For uninstall, use "Add/Remove Programs" from the control panel. Alternatively, drill down into \patches\\\_uninst and run the uninstall executable. On UNIX: 1. The machine where Verity is installed must have a JVM installed. If there is no JVM on the target matchine, please use the old style patches. For HPUX and AIX, the customer must get the JVM from their respective O/S vendor. 2. Download to the machine where the K2 Install is located. 3. Run the executable by typing the name of the file on a command line. If the host does not support the graphical mode, then use -console. For example, type "K2TK_p07_ssol26Install.bin -console". 4. Follow the instructions in the interface. 5. For uninstall, a symbolic link is created under /patches. This will always point to the most recent patch uninstaller. Simply run that executable on the command line. Alternatively, drill down into \patches\\\_uninst and run the uninsall executable. - INSTALLATION INSTRUCTIONS --------------------------- To install the patch, follow these steps: 1. Stop all the running K2 processes, rck2 processes and all client C, Java and COM applications which use Verity API. 2. Shut down the web server (for example, IIS, Tomcat, WebSphere). 3. Backup all the files included in this patch as described above. Copy the files described above into your installation. This can be done by simply unzipping or untaring the patch package at - the parent directory of K2. NOTE: Some files are platform-specific. You do not need to copy files that are specific to platforms other than your own. 4. Restart K2 Or if needed, recompile your C applications, which include the above header files, .nar file, or libraries. 5. Restart the web server. - SPECIAL NOTE ON fix for 93630 ------------------------------- The odk_tax.h header file included in this patch is required for building applications that use VDK-level ODK. - SPECIAL NOTE ON fix for 96842 ------------------------------- When indexing public websites on Unix platform, if a HTML page contains a file URL"file:///" which points to the root directory, Vspider will follow that URL and index its content. Use -exclude "file://[/]*" option to prevent that from happening. - SPECIAL NOTE ON fix for 91119 ------------------------------- A new parameter is allowed for the job.ini file: reposcharmap for the [JOB] section. This parameter allows the user to specify the charmap of the repository when filesys/http gateway is used. For example, the user is indexing a file system in which the file names may contain SJIS characters. In that case the job.ini file should contain "reposcharmap=sjis" in the [JOB] section. The User can also specify the reposCharMap parameter for a job using the environment variable VERITY_REPOS_CHARMAP. For example if we set value of VERITY_REPOS_CHARMAP=sjis then all jobs will work as if the reposCharMap value is set as sjis. The environment variable support if provided for k2spider jobs that are run from Dashboard. Please note that the change in the environment variable must take effect before the K2 processes are started. Note that the value specified in the job.ini file for the reposcharmap field takes precedence over the value specified using the environment variabl;e (if both are specified). If a value of reposCharMap is provided the bif files created by the k2spider for filesys gateway type will contain the DOC_FN key with the file path available in encoded form. The DOC_FN key is not written to the bif files if reposCharMap is not specified. If the user wants to force the DOC_FN keys for filesys to be written (as encoded strings) the user can specify reposCharMap value to be *. A new option is added into vspider command line, "-repos_charmap ", which allows the user to specify the charmap of the repository when filesys gateway is used. For example, the user is indexing a file system in which the file names may contain SJIS characters. In that case the command line should contain "-repos_charmap sjis" Another new option "-keyencode" is introduced in this patch for vspider, which allows user to turn on the key encode in DOC_FN field, please contact Verity tech support if you want to use encoded DOC_FN field. - SPECIAL NOTE ON fix for 91287 ------------------------------- Here is a summary of the mirroring collection indexing support: 1. All collections can have the same alias, but the collection directory names must be different. 2. Each individual collection has to be on a different machine. 3. One cannot stop an indexing job with mirroring collections. If stopped, purge the job and also purge individual collections then restart the job. - SPECIAL NOTE ON fix of 91836 ------------------------------ The vspider option -repos_charmap does not always work correctly with the 850 character set. vspider will use the i18n function to convert vdkvgwkeys (file paths) to the collection charmap. - SPECIAL NOTE ON fix for 91991 ------------------------------- 1. Stop all the running K2 processes, rck2 processes and all client C, Java and COM applications which use Verity API. 2. Shut down the web server (for example, IIS, Tomcat, WebSphere). 3. Backup all the files included in this patch as described above. Copy the files described above into your installation. This can be done by simply unzipping or untaring the patch package at - the parent directory of K2. NOTE: Some files are platform-specific. You do not need to copy files that are specific to platforms other than your own. 4. Restart K2 Or if needed, recompile your C applications, which include the above header files, .jar file, or libraries. 5. Restart the web server. - SPECIAL NOTE ON fix for 92098 ------------------------------- Changes to support document viewing with PushAPI. Note: to use viewing using the Push API style files, you will need two changes. style.uni: Remove the -bifmime option as this would not work if virtual document was made of multiple content streams of varying mime-type. Change: autorec: "flt_kv -recognize -unzip -bifmime" To: autorec: "flt_kv -recognize -unzip" style.dft: Change the way the PrimaryDoc is zoned to use /zone as shown below. Change: zone-begin: PrimaryDoc field: DOC /filter="universal" zone-end: PrimaryDoc To: field: DOC /filter="universal" /zone=PrimaryDoc Push API Style files are located in the following two locations: 1) /stylesets/Def_FileSystem_PushAPI (On Master Administration Server only) 2) /k2/common/styles/fspush - SPECIAL NOTE ON fix for 93314 ------------------------------- To allow K2 Profiler to profile gateway documents (non-collection documents), take these steps: 1) Introduce a new env variable VERITY_K2BROKER_GW_PROF_ON 2) If VERITY_K2BROKER_GW_PROF_ON is set (to any value), broker would not block the gateway profiling requests and pass those down to the server with the following assumptions: - the k2servers are correctly configured with "same/replicated" styleset alias - the actual dockey is accessible from all k2servers using the round robin load balancing mechanism. 3) If VERITY_K2BROKER_GW_PROF_ON is not set, the current behavior is retained. - SPECIAL NOTE ON fix for 93454 ------------------------------- ------------------------------ K2 Spider only understands the standard date formats. One of the customers was using a date format of MM/DD/YYYY in their HTML META tag and using a meta mapper to map it to the Last-Modified header. This was causing issues since this format is not a standard format. An environment variable (VERITY_K2SPIDER_DATE_TIME_FORMAT) was added to let users specify to the K2 Spider what their date format is. Users can use 'Y' for year, 'M' for month, 'D' for day, 'h' for hour, 'm' for minute, 's' for seconds and specify any arbitrary pattern like "MM/DD/YYYY-hh:mm:ss". Once this variable is set to this pattern, any date like "06/04/2004-10:34:27" will be recognized as June 4th, 2004, 10:34:27 AM GMT time. Please note that the time zone used is always GMT irrespective of what the date/time format is. The Parametric Search Results do not do multibyte charset conversions correctly. Multibyte strings in PI search results will appear garbled. - SPECIAL NOTE ON fix for 93500 ------------------------------- Bug in Vparametric missing options to set PBS properties. Added two methods on the VResultSet object: Sets the parameter that decides the maximum number of bytes per passage that will be returned for passage-based summary (PBS). Note that this number includes the bytes required for highlight tags if highlighting was requested. If highlighting tags are long the actual text (excluding the highlight tags) returned for the passages may be less. @param pbsMaxPsgBytes @since 1.2 public void setSummaryMaxPassageBytes(int pbsMaxPsgBytes); Sets the parameter that decides the maximum number of passages that will be returned for passage-based summary (PBS). @param pbsMaxPsgCount maximum number of passages to retrieve with PBS @since 1.2 public void setSummaryMaxPassageCount(int pbsMaxPsgCount); - SPECIAL NOTE ON fix for 93686, 92901, 93136 --------------------------------------------- A new style.plc option (/serialize = 1) is added that will serialize vdk transactions. This could be a performance problem (in some circumstances) when invoked, since indexing concurrency (for any one collection) is lost. Here is a sample of style.plc ================================================= $control: 1 policy: { mode: default /inherit = Generic # inherits from Generic policy mode /serialize = 1 } $$ ================================================= - SPECIAL NOTE ON fix for 93930, 92287 --------------------------------------- Create an environment variable VERITY_WS_POLLING to set the VCM watched services polling interval. If its value is less than 30, 30 will be used as default. Otherwise its value will be set to the polling interval. - SPECIAL NOTE ON fix of 94144 ------------------------------ To support searching for strings that contain periods, the fix is to ignore sentence tagging when counting word positions (for more details check comments in style.prm). Requires collection re-index. It is a collection level configuration parameter (NOEOS) and it is locale-independent. Sentence tokens are still available to VDK summarization and feature extraction. The following related comment and definition are from style.prm: This example enables Word Count word position format but ignores sentence tagging. The word position is bumped upon sentence tokens. However, the sentence breaks may be incorrect, causing phrase op to fail to yield a hit. This option ignores sentence tagging during indexing time for word position counting (i.e., word positions will not be bumped upon sentence breaks). #$define IDX-CONFIG "WCT NOEOS" - SPECIAL NOTE ON fix for 94150, 93822, 93889 ---------------------------------------------- The environment variables VDK_NO_PRF_CASE, VDK_NO_PRF_STEM and VDK_NO_PRF_SNDX can be used to disable CASE/STEM/SNDX in the profiler. The environment variables must be set before the prf is opened and before the queries are loaded. With the environment variables set profiler will turn off case, stem and soundex variations when matching queries. - SPECIAL NOTE ON fix for 94259, 90864 --------------------------------------- For backward compatibility, 1) one entry must be added to the config file (vg wmsxch.cfg). This must be changed in order to turn on the charset conversion: [/] .... charset=1252 .... which specifies the repository charset, for example 1252 2) after the setting is changed, the collection must be purged and re-indexed. Collection purge and re-index are not required if the modification to vgwmsxch.cfg is not needed. - SPECIAL NOTE ON fix for 94266, 92210 -------------------------------------- K2 Spider now has an environment variable called VERITY_K2SPIDER_VdkServiceType that can be set to a combination of service levels e.g. putenv VERITY_K2SPIDER_VdkServiceType "VdkServiceType_Search | VdkServiceType_Index". One may use this environment variable to remove the VdkServiceType_DBA service level which is the root cause of the maxclean. By default VdkServiceType_Search is always on and should be kept on while defining this variable. VdkServiceType_Index needs to be turned on every time K2 Spider Indexer is running without -noindex option. If -nooptimize option in K2 Spider Indexer is not used, VdkServiceType_Optimize should also be turned on along with VdkServiceType_DBA. One may get rid of one or more of these service levels or introduce new ones using this variable. Please exercise caution while using this variable. - SPECIAL NOTE ON fix for 94295, 91464 -------------------------------------- The locuni driver arguments are changed: -separators no longer used. uses common/uni/ctype.cfg instead. -filter_punct added to allow post-processing of non-eos inxight tokens. - SPECIAL NOTE ON fix for 94550 ------------------------------- A new environment variable, VERITY_K2SPIDER_HTTP_USEGET,is introduced for k2spider in this patch. Once VERITY_K2SPIDER_HTTP_USEGET is on (set to 1), k2spider web crawler will always use the HTTP GET method to retrieve pages, instead of using HEAD and then GET. The default is off. - SPECIAL NOTE ON fix for 94595 ------------------------------- This patch introduces a new HTML template for PDF files. The template name called pdfonefile. The sample templates uses this template name as a default template for PDF html conversion. When using VView, the client can choose to use pdfonefile for PDF to perform HTML conversion. It is required to restart k2server after installing the new html template ini file. - SPECIAL NOTE ON fix of 94816 ------------------------------ A new command option, -useget, is introduced for vspider. When -useget is specified, vspider web crawler will always use HTTP GET method to retrieve pages, instead of using HEAD and then GET. The default is off. - SPECIAL NOTE ON fix of 95422 ------------------------------ The "usePreauthentication: FALSE" entry is added to the "repository:" section in the generated vgwdctm.cfg by default. So users don't need to manually add that entry, but just modify the value if needed (by default it's FALSE). - SPECIAL NOTE ON fix for 95644 ------------------------------- Need to uncomment the line "copy: VgwFileModifyDate Date" in vgwfsys.vgw for this bug to work. - SPECIAL NOTE ON fix for 95734 ------------------------------- Prior Pis that built with mkpi and bif files may need to be rebuilt with the new mkpi from the 95734 fix (if the prior Pis have the same problem). - SPECIAL NOTE ON fix for 97418 ------------------------------- The fix requires that when specifying /filter="zone -html -autocharmap" in the style.dft, should NOT use -nocharmap option (or the fix will not work; can use either -precharmap/-autocharmap options. Using flt_zone straight from the style.dft without a charmap doesn't make sense.) - SPECIAL NOTE ON fix for 97681 ------------------------------- The starting point for an indexing job against a secured http site should not contain a file name. If the target URL were http://domain.com/directory/file.html, then the correct starting point would read http://domain.com/directory/. Including the file name will result in failures when trying to view documents returned by a secure search. SPECIAL NOTE on fix for 98178 A new option 'keyBatchCount' is added to the vgwdctm.cfg file to support the performance tuning (see example below). performance: { keyBatchCount: 1000 } The 'keyBatchCount' value can vary on different database systems. The value controls how many keys will be sent in a batch. Currently there is no protection mechanism to prevent malfunction of the database system if the batch count setting is over the limit. So it is strongly recommended that users should check and test their database system on the value before putting it to work. Users can consult their database Administrator or documentation to determine what value should be used. NOTE: After manually adding the 'keyBatchCount' performance setting, currently if users edit/modify the style files with Styleset Editor, the setting will be lost. [c.f. bug 98521] - SPECIAL NOTE ON fix for 96752 ------------------------------- An additional option, HandleDuplicateAttachments, is added to the vgwnotes.cfg under the [Lotus Notes GW] section. The support of the HandleDuplicateAttachments option is intended to fix the very special user case reported in bug 96752 only. In general, users should not use the HandleDuplicateAttachments option as it may cause unexpected failure if used inappropriately. - SPECIAL NOTE ON fix for 98013 ------------------------------- Setting the environment variable VERITY_REPOS_CHARMAP properly is required for multi-byte repositories. - SPECIAL NOTE ON fix for 98860 ------------------------------- Users can now reset the bucketsort set by EnumView, AttributeView and KTreeView. - SPECIAL NOTE ON fix for 99493 ------------------------------- -- UPDATED INDEX/FILTER FUNCTIONALITY With the issue of p19, there are two changes in index/filter functionality that customers should be aware of. First, flt_kv -zoned is now the recommended method for handling zip files. Zip files will be recognized automatically using this setting. Thus, -unzip is no longer recommended. Note that the default style files still recommend using -unzip. We apologize for the confusion, but we are unable to change the default style files in a patch release. The old style.uni configuration for zipfile indexing was: autorec: "flt_kv -recognize -unzip" ... ... ... type: "application/zip" /action = fields-only The new (p19 and later) style.uni configuration for zipfile indexing is: autorec: "flt_kv -recognize" ... ... ... type: "application/zip" /format-filter = "flt_kv -zoned" The second change in index/filter functionality is that there will be no error message if we are unable to filter a subdoc contained in a zip archive. Implementing this enhancement would require an API change that Verity believes should not be made in a patch release. - BUGS FIXED ----------- Current Patch: -------------- Bug 96493: _CACHE_FN disappears from BIF after an indexer crash or job gracefully shutdown. _CACHE_FN disappears from BIF after an indexer crash or job gracefully shutdown. Bug 97616: Fix LOGSUM initialization code. Bug 98144: Port 96284 to 5.5 (field support for non-collection docsources). Bug 98220: Arabic characters are not indexed correctly when the meata tag is set to 1256. Bug 98619: Enable collection security administration thorugh VAdministration API. Bug 98628: Creation of PI from collection has inconsistent results. Bug 98663: Fixes problem where score is lost when using 8-bit precision and sortspec anything other than SCORE and k2server is set to vdkSorting="1". Bug 98754: When converting a Microsoft Word document to HTML, auto-numbering in section headings is not maintained. Bug 98832: K2ParaSearchNewArg.K2ParaSearchNew leaking file descriptor/socket. Bug 98856: K2Server will now always log collection down errors. Bug 98959: When filtering a Microsoft Excel file out of process on Linux a General Error (6) is generated. Bug 99007: When filtering a Microsoft Word document containing a table within a text box, the words in the text box are concatenated. Bug 99016: Update MINSIZE comment. Bug 99056: Add configurable replacement character for kvcs module. (See special note below.) Bug 99133: k2collswap brings collections down in both the k2servers. Bug 99150: The KeyView conversion of some Japanese characters from Shift_JIS to Unicode is not compatible with Microsoft's character set conversion. Bug 99198: Should now be able to create and properly find a search zone with accented characters. (See special note below.) Bug 99414: Multi-threaded submit bif in ODK fails to add all partitions to PDD. Bug 99578: Fixes the problem where the K2 server is locking and needs to be restarted to resume the search on a query error when the query was long. Bug 99627: A signed EML message is filtered as a text file and generates a very large output file. Bug 99859: Mistranslated "ANY" choice in the drop down menu of parametric in Japanese. Bug 100025: No results are retrieved when stopping k2server1 and starting k2server2. Previous Patches: ----------------- Patch 20: Bug 93452: Out of memory when querying words that return high volume on vsearch and PI. The out of memory issue can be mitigated by reducing PI cache size. Two environment variables are added to make it happen. To configure the PI bucket cache size, set two environment variables: VERITY_PI_CACHE_SIZE and VERITY_PI_CACHE_SEGMENT_SIZE. VERITY_PI_CACHE_SEGMENT_SIZE represents the number of segments and VERITY_PI_CACHE_SIZE represents # of cache entries in each segment. Default cache configuration (when these two variables are not set or they are zero): VERITY_PI_CACHE_SEGMENT_SIZE = 16 VERITY_PI_CACHE_SIZE = 10 Recommended setting: VERITY_PI_CACHE_SEGMENT_SIZE = 8 VERITY_PI_CACHE_SIZE = 5 Restart all k2servers after the variables are set. On Windows, reboot machine maybe required if k2server runs as Service. Bug 97882: A 8 bit score of 0.01 will now be represented properly on a collection search. Bug 97981: Importing a big TAX file failed Bug 98354: Stem function in uni UTF8_Stem() doesn't add langid suffix for Korean words Bug 98454: K2server crash, trying to retrieve PI result set from Internet_Advanced query. Bug 98504: When filtering a Microsoft Excel file on Linux and Solaris, a segmentation fault occurs. Bug 98613: Fixed problems creating OTL file when using mktopics. Bug 98689: The search results cache will now timeout at the specified period of time. Bug 98694: Need to pull in the fix for Bug# 95989 to 5.5 Bug 98728: TIFF file mime-type incorrectly populated as application/x-wordprocessor Bug 98778: The categories on a PI Tree search or browse will now return the proper matching properties. Bug 98790: Vspider/k2spider will skip the entire odbc rows when filebyname doesn't exist Bug 98860: Removing a bucket sort for a parametric object when re-using the object (v5.5) (See special note below.) Bug 99290: Port 97696 to 5.5 Bug 99324: Port fix for 98150 to 5.5 Patch 19: Bug 96589: Viewing documments through Notes GW. Bug 96752: Duplicate vdkvgwkey for attachment files (See also special note below.) Bug 97323: When filtering an Microsoft Excel file with a long Japanese filename, the filter program stops responding. Bug 97417: Prevent setting of Verity internal fields that start with an underscore and VdkVgwKey. Bug 97682: ODK cannot do equivalent of mkvdk -credentials. Bug 97762: Access Violation in K2AdminDisconnect function Bug 97834: When filtering a PDF file, some text is replaced by question marks in the filtered output. Questions marks no longer appear in the output, however, some text is missing when the PDF file is filtered to an unstructured text stream. To extract all text, filter the file with logical reading order enabled. See "Filter SDK Guide" for more information on logical order. Bug 97875: K2 Vview (wysiwyg viewing) fails for japanese filenames + japanb + UTF8. Bug 97893: Check for hash collisons and re-generate the hash value if the keys are different. Bug 97971: When filtering a Microsoft Word 95 file, the text in some text form fields is not extracted. Bug 98008,97973,97975: When filtering Microsoft Word files containing complex tables, the filtering program stops responding. Bug 97988: Using special build k2index build for BP K2Index process crashes if coll secure. Bug 98013: K2spider cannot index Jpn directory with halfwidth Kana chars in dirname. (See also special note below.) Bug 98016: K2 sample templates (ASP) have a problem Wysiwyg viewing for Jpn filename Bug 98148: ODBC collections fail to recover if connection to Oracle is lost. Bug 98173: Auto-generating PI creates a garbage category. Bug 98178: add additional option to send dql in a batch size selected by users (See also special note below.) Bug 98224: When filtering a corrupt Zip archive, the filter program stops responding during file detection. Bug 98239: When filtering Microsoft Word V2 documents out of process, the program kvoop stops responding and the error code 6 (KVERR_General) is generated. Bug 98252: Spread-out Query (60k or greater) spikes CPU on k2server indefinitely Bug 98398: Documentum Gateway -- With Cache_off, users can search docs even without the right permission Bug 98659: Prevent crash in Linux when service is starting up. Bug 98058,97603,97083,98239,98348: Improve filter error handling and zipfile processing. Patch 18: Bug 96521: Save PIDs for services for Linux after service has started up and Kill PIDs previously used when before starting up and if service has failed to start up. Bug 97016: Patch installer will now update the Windows registry with a k2admin service dependency in order to prevent startup problems. Bug 97287: Sorting on multiple Pis doesn't work correctly. Bug 97378: The third-party component (InnerMedia DynaZip compression library), which is used to view Zip files, is vulnerable to buffer overruns. See http://www.kb.cert.org/vuls/id/582498. Bug 97383: setPageSize() does not limit total returned bucketsets to specified value. Bug 97418: VDKSUMMARY is not populated if zone filter is used with ntext column. (See special note below.) Bug 97458: k2spider indexing problems with long japanese filenames Bug 97462: Quering multiple pis get the accents broken and not displayed properely. Bug 97468: Sorting results by Bucketset fails with Multiple PI's. Bug 97470: K2server cannot recover ODBC connection with Oracle Wire protocol. Bug 97477,97102: When viewing a Microsoft PowerPoint file the graphics do not display, and the error "Loading Java Applet Failed..." is generated. Bug 97625: discrapency of results in pi categories when you use different separator / or #. Bug 97627: Category separator returned is / even though my separator is #. Bug 97650: DCTM gateway does not handle repeating attributes properly. Bug 97681: HTTP NTLM repository fails at viewing time. Bug 97702: PI has duplicate taxonomy node for URL, one with and one without a "/". Bug 97744: TSI 00238849/Bug 97744: Show a view in SSE even if $flags column is not defined for that view. Bug 97750: mkre -export not working as expected. Bug 97761: Fixed problem where flt_lang would get incorrectly disabled. Bug 97903: K2Spider mirroring broke intermittently. Bug 97974: Need enhancements to Parametric Memory usage in k2server Bug 98298: Fix HandleParaBucketEnumRead() at k2broker level. Patch 17: Bug 93630: Add support for reference categories. Bug 93674: Fix highlighting of stems in uni locale. Bug 94480: Exchange GW does not handle emails from the special character name folde r. Bug 94725: When filtering a Microsoft PowerPoint file, a general error is generated. Bug 95130: When filtering a Zip file containing a corrupt JAR file, the Filter sample program stops responding. Bug 95783: Windows handle leak with http gw. Bug 95862: When filtering a corrupt JAR file, the Filter sample program stops responding. Bug 96153: When filtering a Microsoft Outlook file (MSG) using the Filter sample program, the "Sent" field is not extracted. Bug 96175: Installed patch from Basis Technology. Bug 96254: The Date/Time field in a Microsoft Word file is not displayed when viewing the file in the IHADEMO sample program with bIndexOnly enabled. Bug 96383: When converting a Microsoft PowerPoint file using HTML Export, some slides are enlarged and some embedded graphics are distorted. (Internal) Bug 96518: When viewing a Microsoft Outlook file (MSG) in the VAPIDEMO sample program, the "Sent" field is not shown. Bug 96533: When filtering a Microsoft Word file with header and footer extraction enabled, the output is incomplete. Bug 96609: When a Java GUI application is used to launch XML Export to convert files out of process, the servant process opens a DOS console for each conversion. Bug 96614: After applying KeyView patch 8.2.0.2, filtering a corrupt PDF file generates an exception. Bug 96615: After applying KeyView patch 8.2.0.2, it takes longer to filter a PDF file. Bug 96667: When filtering a corrupt WordPerfect file out of process, the Filter sample program stops responding. Bug 96718: Some Greek characters (Omega, Delta, Mu) in a PDF file are not filtered. Bug 96723: When filtering a Microsoft Excel file, the Filter sample program stops responding. Bug 96724: The Filter program stops responding when processing the tag in a HTML file. The problem occurs intermittently. Bug 96742: When converting a large Microsoft Word file to HTML, the output is incomplete and inaccurate. Bug 96753: When converting a Microsoft Word file to HTML using the wordstyle.ini template, a graphic is not positioned correctly and overlaps the text. Bug 96754: When converting a Microsoft Word file to HTML using the wordstyle.ini template, the table of contents is not numbered correctly. Bug 96833: Fixed the otl generation when exporting from topicset. Bug 96842: Vspider core dumps indexing public web sites. Bug 96871: Include BUTNOT operator via NEAR/-3. Bug 96942: Performance issue with many accumulated cookies. Bug 96982: Patch 14 introduces -23092 error with multiple job submits. Bug 96996: Delete PI partitions only when they are loaded by K2Server. Bug 97035: PI population using BIF assigns documents to categories incorrectly. Bug 97084: Search collections of different locales got error -1900. Bug 97135: Don't write to ddd if attempt to access file has failed. Bug 97139: Plug handle leak when unloading vdk dll. Bug 97144: Removing a taxonomy causes K2Indexer Server to crash Bug 97172: [MT] Port bug 96826 to 5.5.0 (Some Word documents got skipped with odbc gateway). Bug 97193: Clicking highlight arrow to go to next highlight results in "Error on page". Bug 97262: Raise limits on CONTAINS field search. Bug 97269: PDF in Documentum server not displayed as native mode. Bug 97409: Fix missing highlight problem with phrase sub-queries. Patch 16: Bug 95718: Slow result time when paginating over two PIs with VResultSet.fetch() Bug 96314: PI SQL queries failing on middle category IDs Bug 96399: When filtering a Lotus Word Pro file containing bulleted lists, the program stops responding. Bug 96419: When converting documents to XML out of process using the Java sample program XmlTest, the program stops responding. Bug 96446: Problems indexing zip files Bug 96493: _CACHE_FN disappears from BIF when indexer crashes or job shuts down Bug 96504: Handle xdate searches correctly when no time is specified. Bug 96582: VView on ASP template fails to show parts of some documents. Bug 96605: Fix zone search on autostopped zone. Bug 96625: Include missing customer dictionary files for Arabic, Bokmal, Czech, Greek, Hebrew, Hungarian, Polish, Romanian, Russian and Turkish. Bug 96643: VSPIDER crashing trying to index a ZIP file Bug 96706: Fixes problem caused by not ending zone when using style tags in the header, which resulted in summaries to not be generated. Bug 96707: K2 Spider with NTLM authentication ignores robots.txt Bug 96716: [K2S_SvrErr] Generic server error when modifying a field in k2spider_cli Bug 96740: Vspider ignores files in a folder following a problem file with long name/chars. Bug 96762: Deletions from a PI require the K2 server to be bounced to pick up changes. Bug 97126: Problem viewing Powerpoint documents on UNIX Patch 15: Bug 95093: Exchange down - documents get deleted from collection. Bug 95718: Slow result time when paginating over two PIs with VResultSet.fetch(). Bug 95922: Recommendation index cannot be updated with chinese query strings. Bug 96219: Must stop/restart K2Server after publishing a changed PI. Bug 96274: VSpider hanging when trying to index a large Notes repository. Bug 96285: k2spider is crashing with core dump. Bug 96351: k2 spider hangs (crawler falls asleep) when building a big mirrored collection Bug 96399: When filtering a Lotus Word Pro file containing bulleted lists, the program stops responding. Bug 96419: When converting documents to XML out of process using the Java sample program XmlTest, the program stops responding. Bug 96634: As part of the 96036 fix (patch 14), we added logic to merge docIdBitMaps loaded from the new PI partition with the bitMaps loaded from previous PI partitions. For large PIS, the process of merging the bitMaps can take up lot of memory and can even cause process (k2server and k2index) crash. As part of this fix, we have rolled back the merge part of 96036 fix. Patch 14: Bug 90540: Handle autostopped zones correctly. Bug 94294: Port bug 93889 to 5.5 Bug 94564: k2spider does not find root of UNC path immediately, so it skips many documents. Bug 94595: Vview / HTML Export of PDFs are garbled (overlaping text) (See special note below.) Bug 94680: When a PDF file is converted to HTML using the template onefiletoc.ini, the content overlaps. Bug 94917: Max number of Documents option for Documentum GW doesn't work Bug 95086: Server going into "autostop" is producing defunct process Bug 95139: When converting a Microsoft Word file containing hidden text to HTML, extra spaces are inserted at the beginning of the file, columns and some text are not positioned correctly, and the logo is inserted 3 times. Bug 95545: K2spider is crashing with a dump log and k2spider_srv is hanging K2spider is crashing with a dump log and k2spider_srv is hanging Bug 95628: When filtering a PDF document, spaces are inserted between characters. Bug 95644: Profile Rule does not work with date search (See special note below.) Bug 95707: Dashboard cannot retrieve the proxy user and password defined in a job Bug 95734: Incorrect record record placed in bucket when using manual ranking. (See special note below.) Bug 95804: Reference category properties obtained using KTreeView and TreePath were null. Fixed the browse code to return these properties. Bug 95809: When filtering a content access stream, redundant tokens are generated. Bug 95811: If ASP is used and 33,000 urls are in the start doc only 10-11K will be inserted Bug 95818: Noproxy hosts not respecting wildcards. Bug 95878: Enumeration not correct with multiple PIs Bug 95951: Handle no-evidence LOGSUM correctly in profiler Bug 95971: k2collswap does not clear files in the destination folder before doing the copy Bug 95975: Support "flt_kv -timeout seconds" in style.uni Bug 95976: "Soft Error" keys never get re-spidered even if reparse is set Bug 95977: "Soft Error" occur on simple crawls but should not Bug 95992: When filtering a PDF file, the Filter program stops responding. Bug 95993: When filtering a PDF file, the Filter program stops responding. Bug 95994: When filtering a Microsoft Excel file containing chart data, the Filter program stops responding. Bug 96018: When out-of-process filtering times out, an incorrect error code is generated. Bug 96036: Fixed memory accumulation occuring when new PI partitions were created and loaded. Bug 96044: VdkSession passed to PI open call was created with default locale. Added logic to create this VDK session with proper locale. Bug 96061: [MT] Port 90540 to 5.5.0 (mkvdk squeeze runs indefinitely on collection with bad zones). Bug 96067: Remove VdkBufferTokenize and VdkWordStem API usage from k2 code Bug 96069: k2e5.5 patch 12 breaks notes GW [Attachment_default] zone Bug 96076: [MT] port bug 93637 to 5.5 (last modified http header cannot be overwritten) Bug 96077: vdksummary incorrectly contains stylesheet tags Bug 96264: When trying to import a Japanese taxonomy and OTL, VCC gets an error for one of the topic nodes. Bug 96271: Remove VdkBufferTokenize and VdkWordStem API usage from k2550 Patch 13: Bug 91836: -repos_charmap doesn't always work, notably with 850. Bug 94223: Port fix for 90130 (some chinese words defined in thesaurus with english definition won't be found) to K2 5.5 Bug 94522: Styleset editor removes changes to cfg file when saving... cause corruption Bug 94527: Fix conversion for custom datetime metadata Bug 94861: Child documents (attachments) are not deleted in exchange when vspider -nooptimize is used. Bug 94906: Fix mutex handle leaks. Bug 94951: Fix tstr queue overflow problem in flt_lang Bug 94981: Japanese characters in Notes gw coll attachement name has dots in between. For this fix to work, a new option, "EscapeKey", needs to be added in vgwnotes.cfg under [Lotus Notes GW] section. By default, it is set to false. If collection's locale contains multi-byte chars, the option should be set to true to escape vdkvgwkeys to 7-bit url-encoding. As a result, all vdkvgwkeys are in collection's locale and they can be converted correctly to any client's locale. Bug 95009: The ActiveX control in the Viewing SDK stops responding after viewing Microsoft Word files sequentially. Bug 95014: Websites with German umlauts in link with uni locale are not indexed properly. Bug 95093: Exchange down - documents get deleted from collection. Bug 95122: Notes GW can't extract group memberships w/ some Domino Directory Configurations. The change is for customers who have special env setup that need to use an alternative algorithm to gather Notes security information. An env variable, VERITY_LNGW_STD_NAMELIST, is introduced for customer's to use this option. Bug 95139: When converting a Microsoft Word file to HTML, extra spaces are inserted at the beginning of the file, columns and some text are not positioned correctly, and the logo is inserted 3 times. Bug 95158: A single-byte Japanese character is not converted correctly from Lotus multibyte character set (LMBCS) to UTF-8. Bug 95171: Transactions are lost if a full restart of k2. Transactions are lost if a full restart of k2. Bug 95172: Recommendation Index cannot be removed from the Host view Bug 95173: Transaction not being logged if 1st k2server does not have RE logging enabled. Transaction not being logged if 1st k2server does not have RE logging enabled. Bug 95178: fix behavior and performance of LOGSUM operator Bug 95214: SimpleQuery.setPrecision() has no effect Bug 95239: invoking zone filter from style.dft failed on collection with space in pathname. Bug 95261: K2E v5.5, NTLM spidering, vspider: leaks one socket per file Bug 95302: VSPIDER adding extra content to the Mime-type field Bug 95355: setDateOutputFormat( ) function is not working. Bug 95367: k2spider_srv does not index through proxy if proxy authentication supplied. Bug 95368: -proxyauth option in vspider causes proxy to be not used at all. Bug 95380: A long parametric search request used to slow down all other parametric search requests. Fixed the collection thread locking used by parametric worker threads. Bug 95427: port 94675 to 5.5.0 (fixes multibyte charset conversion problem). Bug 95440: wildcard search can return incorrect number of results. Bug 95450: When filtering a Microsoft Works and a Microsoft Excel file on Tru64 platform, formulas are not calculated correctly in the output file. Bug 95256 and 95255: KeyView only filters, exports, and views PDF files that are 40 bit or 128 bit encrypted. Any other encryption method is not supported. (Internal) Bug 95503: When filtering a Microsoft Word 95 file with the option to include headers and footers enabled, the output is truncated. Bug 95510: vsdb status of each document is incorrect for K2spider but correct for vspider Bug 95545: K2spider is crashing with a dump log and k2spider_srv is hanging Bug 95642: getting 0/0 when doing a parametric after PI is brought online Bug 95659: [MT] Port 91611 (fixed in 6.0) back to K2 5.5 (user-dictionaries do not work) Bug 95748: When filtering a PDF file, the program stops responding as a result of buffer overruns. Bug 95804: Reference category properties obtained using KTreeView and TreePath were null. Fixed the browse code to return these properties. Patch 12: Bug 90132: Filter cannot extract endnotes from Microsoft Word files. Bug 91287: Mirroring collections on a master and slave servers work only on the primary collection. (See special note below.) Bug 93681: job.auth is wrong when editting realm through dashboard. Bug 93904: VSPIDER crash indexing some MS Word Docs. (See special installation note below.) Bug 94221: [MT] Port fix for Bug 88609 to 5.5.0. (88609 = mkpi does not honour codepage-setting of non-standard locale.) Bug 94258: [MT] Port fix for 90797 to 550 code base. 90797: Badkey error while indexing an MS SQL table. Bug 94259: Port fix for Bug 90864 (re: exchange gateway attachments can't be viewed with highlights) to K2 5.5. Bug 94263: [MT] Port fix for 91428 to 5.5.0. 91428: Viewing error when filename is written by Japanese. Bug 94295: Port bug 91464 (Symbols indexed as alphabet threated as delimiters during searching) and 93874 (-single_character option does not work from chinese data with uni locale) to 5.5. Bug 94355: PI sort alpha-numeric and clustered in k2server. Bug 94445: Encode basic entities in rdf xml output. Bug 94486: Exchange gateway - fix issues when folder contains '!' Bug 94543: Problem with separator/delimeter in PIs. Bug 94550: Auth File regular expression for Realm does not work for nested realms. (See special note below.) Bug 94568: Stemming and compound words in K2 5.5. Bug 94694: Problem accessing NT domain using NT login module. Bug 94741: Allow no group if user specifies no groups in Active Directory native mode. Bug 94787: Tree Bucket type: Bucket values missing 'Any' choice after field been selected. Tree Bucket type: Bucket values missing 'Any' choice after field been selected. Bug 94816: k2 55 Vspider not able to index SSL site using cookies. Bug 94832: Some previously indexed PDF documents are skipped (works in patch6). Bug 94869: VCC generates new topicset partitions as users make changes. Bug 94874: When filtering a WordPerfect file out of process the kvoop.exe program stops responding. Bug 94909: The VE packaged tutorial example_01 showed a part of speech problem that led to missing some occurrences of "Verity" for common entity "company". Bug 94949: When filtering a PDF file spaces are incorrectly inserted between characters. Bug 94956: K2E v5.5, NTLM spidering, k2spider: leaks one socket per file. Bug 94957: Problems with fields having extended charecters in them. Bug 94962: Stack memory overflow when search times out and it crashes in the log. Bug 95045: k2admin crash (detach a k2server that has the same alias as a foreign service does). Bug 95047: k2spider crash when running 3 or more simultaneous scheduled jobs on 5.01 and 5.5. Bug 95114: k2admin crash in multi-host environment. Bug 95135: Made changes to libpi (Parametric Indexing module) to support Reference Categories. (See special installation note below.) Bug 95160: Problems with ViewURL having special charecters. Bug 95304: Port bug 94275 (Having an issue indexing a bulk file through mkvdk. mkvdk crashes on AIX) to 5.5. Patch 11: Bug 93489: Bad exception thrown from VSearch.collectionsInfo() Bug 93755: Having issues where k2spider indexer seems to core dump on large indexing job Bug 94150: 94150: selective control over case/stem/sndx in profiler (See special note below) Bug 94234: Document link rendering error in Sample Templates. Bug 94265: [MT] Port fix for 92174 to 5.5.0 (Multiple submit bif (update) to k2spider (ctrl) causes lost connect to crwl/idxr) Bug 94266: [MT] Port fix for 92210 to 5.5.0 (k2spider_srv indexer should not use maxclean by default) (See special note below) Bug 94270: [MT] port 92296 (double delete of a key can crash k2spider controller during refresh) to k2 5.5.0" Bug 94449: The Filter program crashes when filtering a corrupt PDF file. Bug 94510: win2k - services "restart" does not start the k2 service properly Patch 10: Bug 93450: If the resolution of an MPEG file (.MPG) is less than the movie Viewing window, the movie is blank when you resize the window. Bug 93748: When processing a corrupted Microsoft PowerPoint file, the Filter program stops responding. Bug 93830: k2spider_srv cannot create bif if many jobs started in round-robin env. Bug 93904: VSPIDER crash indexing some MS Word Docs. Bug 93922: score NEAR, FREQ, and NORM correctly with a zone. Bug 93976: PI/Leading space bug. Bug 94068: VdkBufferTokenize "swallows" word if there is only one word in buffer Bug 94081: Mixed encoding within one record causes invalid word list. Bug 94092: Merging collection causes incorrect search results. (See special note below.) Bug 94112: unable to search pi until the k2server is restarted. Bug 94143: [MT] Port fixes in 91725 to 5.5.0. Bug 94144: [MT] Port fix for 93523 to 5.5.0 (See special note below) Bug 94148: [MT] Port fix for 91367 to 5.5.0. Bug 94157: [MT] Port fix for 93830 to 5.5.0. Bug 94262: [MT] Port fix for 92928 to 5.5.0. Bug 94333: Fixed bucketSet purge logic. Bug 94528: Fixed memory accumulation in k2indexer. Bug 94552: No position info. returned from tokenization API for locale english. Bug 94650: ODBC auto-refresh would not update after K2550 patch 9 was installed. Bug 94651: Make vgwnotes5 the default for vgwnotes. Patch 9: Bug 91401: Unable to launch SSE from Dashboard when k2admin not a service Bug 91836: -repos_charmap doesn't always work, notably with 850. Bug 93499: Significant performance overhead to Batched PushAPI SubmitDoc Bug 93773: k2spider not excluding document because of meta tag. Bug 93837: 93837: Resolving global nested groups - K2 55 (need to merge in mainline fix for 91728) Bug 93838: 93838: patch 13 - problem with secure filesys indexing - K2 55 (need to merge in the mainline fix for 92318) Bug 93930: Fixed 92287 "k2admin leaves many TIME_WAIT on O/S netstat" in K2 5.5. Bug 94133: Ported 93181 to 5.5.0 "k2spider is instable(freeze and crash) when indexing internat/NTLM security". Bug 94134: [MT] Port fix for 93326 to 5.5.0 Bug 94135: [MT] port 93485 to 5.5.0 Bug 94137: Port fix for 93533 to 5.5.0 "vgwhttp.dll with authentication errors leaks sockets". Bug 94138: [MT] Port 93534 to 5.5.0 (for fixing: K2spider automatic recovery sometimes creates 'wrong' job subdir). Bug 94139: [MT] Port 93705 to 5.5.0 (for fixing: K2spider-crawler crashes during crawl/refresh (nosubmit)). Bug 94196: Sort not working properly when attached to k2broker. Bug 94260: [MT] Roll back the bug 92218 fix for 55 patch Bug 94280: Need to package the fix for 93629 in the K2TK patch instead of DASHB patch. Bug 94403: K2server doesn't release file handles after patch 8. Patch 8: Bug 93409: Problem with Bulk Indexing Long Field Value using Pipes Bug 93450: If the resolution of an MPEG file (.MPG) is less than the movie Viewing window, the movie is blank when you resize the window. Bug 93549: Error registering NT domains if hit a NT4 domain. Bug 93686: [MT] fix 91097 in 5.5 patch branch Bug 93689: [MT] fix 92901/93136 in 5.5 patch branch Bug 93748: When processing a corrupted Microsoft PowerPoint file, the Filter program stops responding. Bug 93758: Search performance issues due to lost packet. Bug 94019: In audit trail Chronicle key mode, some delete events should become updates to preserve the LABELED version in the collection. Patch 7: Bug 93075: Sort request fails for certain fields giving -1711 Bug 93145: No metadata while streaming with RDF type Bug 93454: K2 Spider only understands the standard date formats. One of the customers was using a date format of MM/DD/YYYY in their HTML META tag and using a meta mapper to map it to the Last-Modified header. This was causing issues since this format is not a standard format. An environment variable (VERITY_K2SPIDER_DATE_TIME_FORMAT) was added to let users specify to the K2 Spider what their date format is. Users can use 'Y' for year, 'M' for month, 'D' for day, 'h' for hour, 'm' for minute, 's' for seconds and specify any arbitrary pattern like "MM/DD/YYYY-hh:mm:ss". Once this variable is set to this pattern, any date like "06/04/2004-10:34:27" will be recognized as June 4th, 2004, 10:34:27 AM GMT time. Please note that the time zone used is always GMT irrespective of what the date/time format is. Bug 93478: Problems with indexing javascript links. Bug 93785: Fix for problem where vconfig was hanging on a file cleanup on linux AS 3.0 Patch 6: Bug 91358: k2Broker does not sort results by category score Bug 91400: When filtering an AutoCAD Drawing file (DWG), the text is not extracted and errors are not generated. Solution: The compnent kpdwgrdr was fixed. Applicable platform: Windows and UNIX. Bug 92218: Exporting PI data from VIC creates a corrupt data file for Korean characters Bug 92228: Some Microsoft Word files containing EMF graphics are not filtered. There are also some fidelity problems when converting the same files using HTML Export. Solution: The component kpemfrdr was fixed. Applicable platform: Windows and UNIX. Bug 92638: k2server crashes when query logging tries to write a new file Bug 92834: When converting some Microsoft Word files using HTML Export, bulleted lists are converted to numbered lists. This is a regression from KeyView Patch 7.4.11. Solution: The component mw8sr was fixed. Applicable platform: Windows and UNIX. Bug 92881: Implement "localized not" in Profiler. Bug 92965: VSearch.disableConnectionPooling() ported from 4.5.1 to fix connection pooling problems with many JVMs Bug 93314: k2broker cannot profile gateway Bug 93390: When filtering some PDF files, the program stops responding. Solution: The component pdfsr.so (.dll) was fixed. Applicable platform: Windows and Solaris. Bug 93413: When filtering some Microsoft Excel files, the program stops responding. Solution: The component xlssr was fixed. Applicable platform: Windows and UNIX. Bug 93414: When filtering some Microsoft PowerPoint files, the program stops responding. Solution: The component kpwmfrdr was fixed. Applicable platform: Windows and UNIX. Bug 93415,93416: When filtering some Microsoft Word documents (.doc) and templates(.dot), the program stops responding. Solution: The components kwad and mw8sr were fixed. Applicable platform: Windows and UNIX. Bug 93435: k2server and k2broker are crashing daily - restart box needed to revive Bug 93448: k2spider controller crashes when two modify job commands are sent in a row. Bug 93500: Bug in Vparametric missing options to set PBS properties. Added two methods on the VResultSet object: Sets the parameter that decides the maximum number of bytes per passage that will be returned for passage-based summary (PBS). Note that this number includes the bytes required for highlight tags if highlighting was requested. If highlighting tags are long the actual text (excluding the highlight tags) returned for the passages may be less. @param pbsMaxPsgBytes @since 1.2 public void setSummaryMaxPassageBytes(int pbsMaxPsgBytes); Sets the parameter that decides the maximum number of passages that will be returned for passage-based summary (PBS). @param pbsMaxPsgCount maximum number of passages to retrieve with PBS @since 1.2 public void setSummaryMaxPassageCount(int pbsMaxPsgCount); Bug 93503: Custom metadata fields in PDF files are not filtered by KeyView Filter. Solution: The component pdfsr was fixed. Applicable platform: Windows and UNIX. Patch 5: Bug 91907: When converting a PDF file using HTML Export, some text in columns overlap. Bug 92273: When converting a PDF file using the cReplaceChar member of the KVHTMLOptionEx data structure, the specified replacement character is not used consistently in the output. The cReplaceChar member specifies the character used when a character in the source document’s character set cannot be mapped to the output character set. Bug 92410: When filtering a PDF file only metadata is extracted. The body text is ignored. Bug 92496: Vspider -metafile will not skip documents properly. Bug 92642: Special chars in VdkVgwKey cause problems with UNI locale. Bug 92787,93296: When filtering a corrupt PDF file, the filtering process stops responding. Bug 92843: When filtering a PDF document some words are omitted from the output. The PDF reader maps some glyph values to a question mark (?). Bug 92891: VSPIDER cannot handle sharename in UNC path. Bug 92950: Some Microsoft Excel files are not filtered on Linux. Bug 92953: When viewing a Microsoft PowerPoint file using the IHADEMO, the program stops responding. Bug 92986: When filtering some PDF files, words in the output are concatenated. Bug 93158: Index Server crashes when over 25k buckets are selected. Bug 93170: A Taxonomy gets corrupted when a category is moved to a category which belongs to a subtree beneath it. Bug 93249: Delete events not picked up for audit trail updates Patch 4: Bug 90713: VSearch.dll does not close sockets when k2admin is restarted on Solaris Bug 91483: VDK locale-wrapper/Uni Inxight stemmers not working correctly (Greek etc.) Bug 92185: CPU Binding broken on hyperthreaded machines Bug 92278: RCK2/Api not sorting results from ktree the same as VIC/Rcvdk Bug 92342: Taxonomy categories created in PI from autogenerated categories are not in UTF-8. Make sure that only CatName and CatId added to Taxonomy from AutoCatExtract is in utf-8. Should not affect PI enums. Bug 92373: Fixed memory leak in PISearch call. Bug 92375: mkre -update call purge PI cache at servers and brokers. The cache purge call was crashing if there are multiple clients connected using the cache. Fixed the Purge call to make sure the use count is 0 before deleting Items. Bug 92409: Searching a Knowledge Tree built with the spanishx locale is extremely slow Bug 92585: 91568 needs to be put in 5.5 patch. Bug 92817: VDKSUMMARY using uni locale and HTML docs cannot populate prior to Bug 92947: If a Category start node had a space in its enum, the sort spec parse was failing. Fixed the parsing to be more robust. Bug 92948: PI Populate call with merge flag was implicitly doing population from collection. Fixed it in the k2 layer to explicitly prevent population when merge flag in on. Bug 93277: VProfiler API PrfSetInfo xmits extra data. Patch 3: Bug 91537: Connection refused with SSL option pack. Bug 91581: ODBC GW - multi-row column is mapped to a VDK field, duplicates are not filtered Bug 91708: MKPI will not accept a BIF that contains double slashes in the category ID. Bug 91747: When you filter a Microsoft Word document and include headers and footers in the output, a general error (error #6) is generated. Bug 91838: When you filter a Microsoft Outlook file (msg) in process, and get the output character set using the function fpGetTrgCharset(), the returned characters set is incorrect. Bug 92004: PI GetInfo crashes K2Broker. Bug 92005: Highlighting multiple fields for PI search returns NULL Doc object + typo fixed: High(t)ligh Bug 92098: Problem with push api when submitting multiple unc documents at the same time Bug 92342: Taxonomy categories created in PI from autogenerated categories are not in UTF-8. Bug 92461: Forward port 91304 fix to K2 v550 - K2spider mutex problem with too many unique cookies. Bug 92568: fix the group name and group membership issue during authentication Bug 92737: This fix resolves some issues with VIndex that occured when some streams-related optimizations were done to Verity.jar. Without this fix one can encounter an exception while trying to connect to the K2 Spider and a hang when trying to create a job. Patch 2: Bug 89546: Invalid docid specification in bif file causes k2server crash Bug 90305: PI navigation shows annonmalies when PI is populated using path based categories. Bug 90849: Unable to populate Multi-Byte chars into fields with PushAPI Bug 91194: PI stats remote call does not include taxonomy information. Bug 91195: Merge partitions call causes search to be incorrect. Bug 91203: The fix to 91145 (javashared RPC performance bottleneck) should be in 5.5. Bug 91210: 90690 in 5.5 - Parametric index cannot establish connection after stress load of Bug 91495: NOTES g/w: vdk notes field getting truncated to 2048 characters Bug 91497: NOTES g/w: indexing and crawling hangs while using notes gateway Bug 91498: NOTES g/w: can not find documents with asian characters in the reader list Bug 91860: Bucketsets of children get nullified on purge of parent Patch 1: Bug 90511: When converting a PDF document to HTML using the basic or high-fidelity PDF reader, text and pictures overlap, and spaces between words are missing. Bug 90615: The fpGetTrgCharSet function returns a different character set when a document is filtered in file mode and stream mode. Bug 90691: You can use documents that are already in a collection but not yet in the associated "doc" Recommendation Index (RI), as inputs to the Recommendation APIs. However, these documents will not be included in recommendation search result lists until they have been added to the RI by the next scheduled "mkre -update" Job. Bug 90695: There are multiple problems when converting a Microsoft Excel spreadsheet to HTML. The second sheet in the file is not included in the table of contents in the HTML output. The formulas in the spreadsheet are not converted properly, and cells containing numbers appear as $$1, $$2, $$3, etc. Bug 91012: There is no longer a problem with the Recommendation Engine finding indexes from the K2 Broker when a K2 Ticket Server is attached in the system. Bug 91119: Spider cannot handle a Japanese file/dir-name that ends with a 0x5c byte Bug 91143: Unable to select Indexes for Reporting when Domain Name Set for K2 Bug 91178: Cannot view reports when an External Domain name is set Bug 91223: Report server crashes during report generation with null values for Category Bug 91289: PBS MaxPassageBytes should be actual passage text length not inc. overhead Bug 91409: Repeated requests to retrieve documents from k2server/k2broker gives error -15872 Bug 91480: file name or path has Japanese chars is not in collection charset Bug 91520: k2server memory leak with tree searches, VDK spanword fix Bug 91659: When converting a Microsoft Word document to XML, a general error (error #6) is generated. Bug 91701: SSO configuraiton of LDAP login module requires manual admin.xml editting Bug 91812: Refresh Reports UDJ, terminated with error (too many open files) Bug 91860: Purge operation on parent category should not purge documents from child categories Purge operation on parent category should not purge documents from child categories Bug 91991: Failed to retrieve topic when importing Verity_IT packaged taxonomy Bug 92043: K2Index crashes while performing PI stress test Bug 92047: K2Index gets into deadloack while performing PI stress test Bug 92087: K2 5.5 windows pre-auth is not filtering properly Bug 92164: K2Server gets into deadlock while running PI stress test Bug 92272: Incorrect version of Results.java for basic install on Windows - SPECIAL NOTE on fix for 98220 ---------------------------------- By default, an HTTP "charset" parameter in a "Content-Type" field should have higher precedence over the charset defined in the meta declaration with the "http-equiv" set to "Content-Type" (as instructed by W3C Recommendation). For the 98220 fix, by default, if the HTTP header charset is defined, K2Spider and VSpider will add a 'Charset' field in the bif file so that downstream won't detect the charset based on the meta setting. For http gateway, a charset field token will be created. To ignore the HTTP "charset" parameter setting, users can use the following configurations: 1) For K2Spider, set the environment variable: VERITY_K2S_IGNORE_HTTPCHARSET 2) For VSpider, use the new command-line option: -ignoreHttpCharset 3) For Http gateway, set the option "ignoreHeaderCharset: True" in the vgwhttp.cfg file. For example, $control:1 ignoreHeaderCharset: True $$