Global Community Tools Project
Draft 01/22/2000
John Caron
Statement of Purpose
The purpose of this project is to build software tools that support global
on-line communities of people, who form around specific interests or tasks
and who require sophisticated tool support in order to accomplish their
goals.
Background
Arguably the most important long-range consequence of the emerging Internet
infrastructure is the formation of new kinds of on-line, or virtual communities.
These virtual communities may differ from "real" communities in a number
of important ways. Most obviously, virtual communities are freed from the
limitations of geographic proximity and simultaneous presence. Because
communication is computer-mediated, any number of software tools and affordances
may be provided which allow forms of interactions and capabilities not
possible in non-virtual groups. The enabling technology for these
tools is only a few years old, and so the possibilities of these virtual
communities is historically new.
These are some of the charactoristics of virtual groups that are important
for this project:
-
Size Computer
mediated communication (CMC) can support much larger groups than groups
that communicate "face-to-face". Our tools should scale to group
sizes in the hundreds, possibly thousands, while continuing to make efficient
use of any members time. Such groups can potentially tackle problems
no single individual or a hierarchically organized group is capable of.
-
Diverse roles and viewpoints Different members will play different
roles within the group, and have different, possibly conflicting opinions
or understandings of fact.
-
Decentralized control A large group must work primarily in
parellel; any centralized control will eventually become a bottleneck.
Technical and social structures must both reflect this.
-
Convergence mechanisms There must be ways for group knowledge
to be organized to allow getting the "big picture". Important information
must be recognizable.
Broadly, virtual groups need tools that support information organization
and retrieval, discussion and argumentation, and negotiation and decision.
Some of this technology is ready for wide-scale deployment, and some is
in the research stage. This projects aims to evaluate emerging technology
in the context of the needs of on-line global communities, while simultaneously
envisioning what kinds of communities are possible given the proposed tools.
To accomplish this, we need equally the visions of the technologist, the
sociologist and the community activist.
The first step is to create a framework and a process where people can
share ideas and produce real applications to be tested and iteratively
improved. We need to attract a critical mass of users and software developers
who are committed to the project's continuing success and evolution.
Direction
While the above vision may provide long-term motivation, this project will
be defined at first by its technical direction and initial applications.
We intend to produce working software that solves existing problems for
real people, and this goal will pull our vision into concrete form.
Technical Basics
We will organize ourselves as an open
source software project, and the software will be freely available
under the Gnu Public License.
Rather than a formal design methodology, we will use the process of user
feedback and iterative development to move in practical steps from software
that supports existing practices towards new features and methods.
In order to provide cross-platform support, we will use Java, which
supports the Windows, Linux, and Solaris operating systems, as well as
MacOS X and other UNIX platforms, with a single source codebase.
Java is a modern, object-oriented language with very broad, freely available
libraries, which provide support for such features as email access, networking,
database, XML, GUI, help systems, web access, etc. This allows us
to concentrate on building our applications.
Target Group
The initial target group will be existing online technical support communities,
or
tech-groups
for short. These groups have the following characteristics:
-
their purpose is to freely exchange information about a relatively narrow
technical subject.
-
they use electronic asynchronous messaging (eg Usenet, email, or Web-based
forums) as their primary communication.
-
the messages are usually in question and answer format, with some discussion.
-
there are sufficient number of technically competent participants to generate
a continuous flow of useful information. These are often volunteers, who
are motivated by the implicit recognition of their expertise.
-
information in the groups is of variable quality, but can often be judged
correct or not with some accuracy.
-
anyone can post answers and comments, so the distinction between expert
and novice is blurred, and multiple viewpoints are allowed.
Online tech-groups have become the primary way for software companies to
answer questions about their products. In these cases, the company's
tech-support personnel answer questions in the public forum provided by
the tech-group, instead of, or in addition to volunteers. (See section
1 of [1] for more discussion of tech-groups).
There are surprisingly few tools that take advantage of the valuable
information contained in the message streams of tech-groups. The current
practice is to provide a Web-based or Usenet interface to the message archives,
sorted by thread or date. Sometimes a keyword search facility is provided.
A significant improvement would be to add better searching and organizing
capabilities for these groups.
The advantage of this approach are:
-
by providing better tools to existing groups, those tools have a significant
chance of being widely used in a short amount of time.
-
by targeting tech-groups, we expect that many users will have the skills
to install and administer it, and to contribute to further development.
New Methods of Information Retrieval
Keyword-based retrieval requires a user to guess what words or phrases
are likely to be in the documents they are searching for. Users who are
unfamiliar with the domain vocabulary can find it difficult to find information
that they need. A promising alternative is Latent
Semantic Analysis (LSA), a type of vector-space information retrieval.
LSA uses a mathematical algorithm called singular value decomposition to
compute statistical correlations between words in a document set. Queries
can be matched to relevent documents even when the document and the query
have no words in common. Preliminary tests have found this approach to
be significantly better than keyword searches in matching natural language
questions to previous questions and answers in tech-group archives [1].
There is much current research being done in information retrieval,
document categorization, user interface, information display, etc.
It may be possible that our project can provide an environment for testing
new research.
While Artificial Intelligence (AI) techniques may eventually provide
important capabilities, our emphasis in this project will be tools that
assist humans, sometimes called Intelligence
Augmentation (IA). This kind of software keeps humans "in the loop",
with an emphasis on the efficient use of people's time.
Initial Application
The first application is provisionally called the Frequently Asked
Question Organizer (FAQO) (other suggestions welcome). Alpha
versions of this software exist and can be downloaded from here.
The following is an overview of the initial beta release, scheduled for
June 2001.
The most prominant feature is the searching of message archives using
LSA. This feature should provide the additional benefit over existing practices
to offset the cost of installing and learning a new application. The other
important feature will be the creation and maintenance of the message archives.
The initial release will consist of a server and three client applications.
The clients are all pure Java implementations, while the server has some
native code. Each application supports a specific role that a user
might play.
FAQuery This is a simplified version for people who want
to just ask a question. Its GUI must be simple and intuitive; it probably
needs an implementation that runs in a web browser.
FAQOTech This is for "tech-support" personell who answer
questions. Connects to an IMAP server to get incoming messages, and allows
user to efficiently query message archives and respond with an email. Makes
suggested changes to the database. Any number of people can play
this role simultaneously.
FAQOwner This is for the owner of the database, who decides
what messages are saved in it, and approves any changes to it. Only one
person per database has this role, but in the future we will add support
for multiple owners in a relationship of trust.
Doubts
-
Its naive to think that technical solutions can solve our social problems.
Not
solutions, but tools that enable new kinds of communities from which solutions
might emerge. We are defined by our tools.
-
Various tools are already available, so either they are already being used
(so why is this project needed?) or not (so why would this project make
a difference?). Web tools are evolving fast, but have current limitations.
The ecology of communities is fragile, and conscious attention to their
needs is more likely to be successful than market forces alone.
References and Further Reading
Journal of Computer-Mediated Communication
Peter Kollock and Marc Smith, Managing
the Virtual Commons: Cooperation and Conflict in Computer Communities.
Pp. 109-128 in Computer-Mediated Communication: Linguistic, Social, and
Cross-Cultural Perspectives, Amsterdam: John Benjamins,1996
Previous papers
John Caron, Experiments
with LSA Scoring: Optimal Rank and Basis, SIAM Computational IR Workshop.
October 2000.
[1] Applying LSA to Online
Customer Support: A Trial Study. John Caron, Unpublished Master's Thesis.
May 2000.
John Caron, Design
for the FAQ Organizer Application, Dec 1999.
John Caron, Wide
Area Collaboration: A Proposed Application. April 1998.
Other Question/Answering systems
Ackerman, Mark S. Augmenting
the Organizational Memory: A Field Study of Answer Garden. CSCW'94
Ackerman, Mark S. and McDonald, David W., Answer
Garden 2: Merging Organizational Memory with Collaborative Help CSCW'96
Budzik, J. and K. J. Hammond (1999). Q&A:
A System for the Capture, Organization and Reuse of Expertise, Proceedings
of the Sixty-second Annual Meeting of the American Society for Information
Science, Information Today, Inc., Medford, NJ, 1999.