Geruva Publications - Software Dept.

Order the CD

Contact us by email

Home Page


cs2020 - Natural Language Processing (NLP) Packages for Linux, UNIX, Windows, and Macintosh

A collection of Natural Language processing Programs and libraries. Some are from University research projects; some are from other research groups and enthusiasts; some are less cutting edge, and rely on more or less established NLP techniques. Many of the packages attempt to achieve platform independence by relying on JAVA and Web interfaces. Copyright 2004 Edition Arnold Kochman. Other copyrights apply, including but not limited to the GNU Public License

Windows and DOS users will need one of the commonly available unzip type utilities, such as PKUNZIP or WinZip. Programs are distributed with source, when appropriate, and some programs in C, for example, will have to be compiled.

The various packages are at different levels of maturity and completeness, and I cannot certify that they are all worthwhile for any particular purpose. You will have to judge for yourself, but there is a lot to choose from.

Here is a listing of the packages included:

PHPDictionary - A tool written in PHP and SQL, to facilitate creation of online or offline natural language dictionaries quickly and easily, without knowledge of programming or formatting. PHPDictionary is, in principle platform independent, but it requires PHP and PL/SQL (MySQL).
NL Toolkit - A Python package that simplifies the construction of programs to process natural language. It define standard interfaces between the components of an NLP system. Runs on Windows (Win32) and UNIX/Linux. Written in Python.
OpenNLP - A package that includes and coordinates several functional elements for natural language processing.. The tools contain a sentence detector, a tokenizer, a pos-tagger, a chunker, a name finder, and a full parser. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP components. It is intended for use by developers in creating Human-Machine Interfaces.
OpenNLP Maximum Entropy Package - Maximum entropy is a powerful method for constructing statistical models of classification tasks, such as part of speech tagging in Natural Language Processing. Several example applications using maxent can be found in the OpenNLP Grok Library. This package is text based and runs under Java.
Howie - An artificial intelligence agent with a natural language interface. It is designed to be simple to install, configure, and extend. Runs under Windows, and Linux. Written in C++ and Python.
Deduce - An artificial intelligence program which accepts natural language sentences as input, then allows the user to ask questions against that input. Deduce attempts to answer questions using logical deduction techniques. Duduce is written in Python.
Aikernel - An intelligence server and cell runtime environment that uses natural language processing and other pattern matching with Activators, Contexts, Concepts to allow multi tasking between installed cells. Runs in Java environments.
OpenNLP Grok Library - A library of natural language processing components, including support for parsing with categorial grammars and various preprocessing tasks such as part-of-speech tagging, sentence detection, and tokenization. The Grok library is intended for use by developers of Artificial Intelligence/Human-Machine Interfaces It runs in Win32 and X11 environments, and is written in C++ and Java.
OpenCCG - An OpenNLP CCG (Combinatory Categorial Grammar) Library, is a collection of natural language processing components and tools which provide support for parsing and realization with Combinatory Categorial Grammar. OpenCCG is a Text Based system for Education, Science, and Research in connection with Artificial Intelligence and Human-Machine Interfaces. It runs in a Java environment.
ACOPOST implements and extends well-known machine learning technique for part-of-speech tagging in a natural language text. It is a Text Based system written in C and PERL.
VP1 - The VP1 project provides a generic, user friendly platform for creating an artificial speaker of natural language. It consists of a Dialogue Engine, an XML Language Content Framework, and a corresponding Content Authoring Environment. Runs on Windows (Win32).
Pytalk - A Natural Language Understanding program written in Python. Features of the code include: an English language parser, an English dictionary tagged with parts of speech, an indexed file module. It is console based and runs in a web environment.
TALK AGENT FRAMEWORK - A software framework that allows developers to create inteligent agents which accept commands in natural language (English). Software agents are created by adding new ontologic knowledge and software components to the framework. User-to-Agent Natural Language Communication is provided based on UNL standards through the use of an enconverter/deconverter based on Thought Treasure Natural Language Processing Tool. Agent-to-agent communication is implemented using technologies provided by FIPA-OS, which implements FIPA standards. Operates under Windows, Linux, and MacOS X.
Example-based Development of Grammars (EDG) - A system implemented in Lisp for building natural language grammars and lexicons incrementally and interactively. Runs in text console and Web Environments. The programs are written in LISP and intended for use by developers of human-machine interfaces.
XML Intermediate Representations (XIR) - A collection of tools for XML processing, primarily in object-oriented languages. The system offers support for Java, Python, and C# developers.
OpenNLP Leo Project - A project initiated by RIALIST, the natural language research group at RIACS, in conjunction with the Center for the Study of Language and Information (CSLI) at Stanford. The initial goal of Leo is to provide an architecture for defining XML specifications of grammars for different natural language parsing systems such as Gemini (SRI), LKB (Stanford), and Grok (Edinburgh).
VISL Constraint Grammar Compiler - A natural language parser generator. It is an implementation of Pasi Tapanainen's CG-2 constraint grammar formalism. It is text based and runs in Microsoft and UNIX/Linux (POSIX) environments. C++ source..
NLP Resource Links.html