<?xml version="1.0" encoding="UTF-8" ?><xml><records><record><database name="!wdg&apos;s ref list.enl" path="/Users/gray/06 Writings/!Library-read/!wdg&apos;s ref list.enl">!wdg&apos;s ref list.enl</database><source-app name="EndNote" version="11.0">EndNote</source-app><rec-number>2379</rec-number><foreign-keys><key app="EN" db-id="vx99r525gwvft0ep05ix2afmtxz0590259zw">2379</key></foreign-keys><ref-type name="Book Section">5</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Lindsey, Robert</style></author><author><style face="normal" font="default" size="100%">Veksler, Vladislav D.</style></author><author><style face="normal" font="default" size="100%">Grintsvayg, Alex</style></author><author><style face="normal" font="default" size="100%">Gray, Wayne D.</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Be wary of what your computer reads: The effects of corpus selection on measuring semantic relatedness</style></title><secondary-title><style face="normal" font="default" size="100%">Proceedings of the 8th International Conference on Cognitive Modeling</style></secondary-title></titles><keywords><keyword><style face="normal" font="default" size="100%">Measures of Semantic Relatedness, semantic similarity, training corpus, corpus comparison, Pointwise Mutual Information, PMI, Normalised Google Distance, NGD, computational linguistics, natural language processing</style></keyword></keywords><dates><year><style face="normal" font="default" size="100%">2007</style></year></dates><pub-location><style face="normal" font="default" size="100%">Ann Arbor, MI</style></pub-location><abstract><style face="normal" font="default" size="100%">Measures of Semantic Relatedness (MSRs) provide models of human semantic associations and, as such, have been applied to predict human text comprehension (Lemaire, Denhiere, Bellissens, &amp; Jhean-Iarose, 2006). In addition, MSRs form key components in more integrated cognitive modeling such as models that perform information search on the World Wide Web (WWW) (Pirolli, 2005). However, the effectiveness of an MSR depends on the algorithm it uses as well as the text corpus on which it is trained. In this paper, we examine the impact of corpus selection on the performance of two popular MSRs, Pointwise Mutual Information and Normalised Google Distance. We tested these measures with corpora derived from the WWW, books, news articles, emails, web-forums, and encyclopedia. Results indicate that for the tested MSRs, the traditionally employed books and WWW-based corpora are less than optimal, and that using a corpus based on the New York Times news articles best predicts human behavior.</style></abstract><notes><style face="normal" font="default" size="100%">DTO ARDA ARIVA A-SPACEX</style></notes><urls><pdf-urls><url>internal-pdf://LindVeksGrintGray07_ICCM-0722949632/LindVeksGrintGray07_ICCM.pdf</url></pdf-urls></urls><research-notes><style face="normal" font="default" size="100%">DTO ARDA ARIVA A-SPACEX</style></research-notes></record></records></xml>