|
|
|||
|
||||
April, 1998
Abstract. The Internet provides the technical basis for asynchronous collaboration across wide areas. Online communities based on Usenet or email groups indicate both the popularity and the frustrations of this form of collaboration. I propose the development of new tools to support two goals: 1) creating ongoing discourse about value-laden social and political problems, and 2) capturing and structuring knowledge from a stream of commentary. A survey is given of previous results and current systems. Requirements are given and a design is proposed for a wide area, asynchronous collaborative system. Identification and support for user roles are central, and structuring and summarizing tools that make efficient use of participants time is essential.
Keywords: Collaboration, asynchronous, wide-area network, groupware, Usenet, structured discourse, collaborative filtering, annotation, collaborative argumentation, recommender system.
INTRODUCTION
In this paper I explore an idea for a Web-based wide-area asynchronous collaborative application for capturing and structuring knowledge. Part two presents motivation and context. Part three contains summaries of related work. Part four presents elements of the application requirements and high level design, including proposed user roles. Part five has some modest contributions to engineering and implementation issues, and part six summarizes and concludes with thoughts on future work.
MOTIVATION
"Wicked problems" are characterized as those in which the problem itself is not easily defined or agreed upon, and therefore there is no clear way to judge solutions [8]. These "require complex judgments about the level of abstraction at which to define the problem" [9] and often involve moral or value judgments around which people disagree. Rittel concludes that they can only be tackled through an "argumentative" method. Shum asserts that "the fundamental way in which we tackle such problems is to discuss them. Consensus emerges through the process of laying out alternative understandings of the problem, competing interests, priorities and constraints."
Most political and social problems fit this characterization. As a motivating example, suppose that we wanted to estimate the "true" ecological costs of the products we consume, along the lines of Paul Hawkens proposals in "The Ecology of Commerce" [5]. To do so well, we need to understand the production process, the inputs, the way the product is used, the cost of disposal or recycling, etc. There needs to be long discussions on how to value any of this: some will try to reduce it to dollars, others will have less concrete metrics. There are a huge number of technical details from industrial and process engineers to assimilate, but the philosopher, ecologist, economist, etc. all have to be involved.
As a paradigmatic context to start from, consider Usenet, the Internet system for broadcasting text messages around the world. The messages are directed to topical newsgroups and are replicated by netnews servers across the Internet. Messages are kept by the servers for an average of 2 weeks before being discarded. Current estimates of total Usenet readership are hard to make accurately, but DejaNews, which archives most of the mainstream groups, estimates 24 million users, and reports that they collect 730,000 messages daily from 50,000 newsgroups, comprising 5 Gbytes of data. Furthermore, two thirds of the messages are "spam-related", meaning off-topic advertisements posted to many unrelated newsgroups.
While some Usenet groups focus on political and social problems and values, they function as social and discussion groups rather than problem solving groups. However, in domains that are rapidly changing, such as technology, Usenet groups act as a very important source of up-to-the-minute information and opinion. What is striking is the presence of many knowledgeable users in these kinds of newsgroups, and their willingness to contribute valuable insights and advice. But other than the sporadic maintenance of a list of frequently asked questions ("FAQs"), there is no systematic capturing of the information that "streams" through a newsgroup and is discarded a few days later. The decision of what is worth saving and how to make it accessible in the future is value laden, and must be weighed against the effort of implementation. There are currently no tools to minimize the costs of those efforts.
Here then are two problems which have interesting similarities and in fact are related: value-laden "wicked" problems, and the capture of current knowledge of a rapidly changing domain from a stream of comments that vary widely in content and authority. In both cases, a wide diversity of opinion is necessary, there is no controlling authority, and anarchy can be seen as an asset, not a liability. These problems require, by their nature, collaborative solutions.
RELATED WORK
MITs Open Meeting System. Roger Hurwitz and John Mallory developed an asynchronous collaboration tool for Vice President Gore's "Open Meeting on the National Performance Review" to organize and structure comments of over 4000 participants on national policy [6]. Comments were linked to evolving hypertext documents with semantics of Agreement, Disagreement, Question, Answer, Alternative, Qualification, or Promising Practice. Web pages were dynamically generated to show a simple indented list of hyperlinks to comments about a particular document. The typed links and associated grammar allowed participants to create argument-structured discourse to "match information processing levels with people's ability to cope with complexity and with their commitment to the collaboration."
Collaborative Argumentation Systems such as IBIS (Issue-Based Information Systems) [8, 2] structure discourse to allow computer support for group communication and decision making. QuestMap is a commercially available hypermedia system based on IBIS. QuestMap has as its primitive objects Questions, Ideas, and Arguments and adds semantic links between objects: Specializes, Challenges, and Expands-on. Other object types are Notes (annotations), References (links to objects not in QuestMap), Decisions, and Views (graph representations of a conversation). QuestMap distinguishes three stages in the process of resolution, called divergence, convergence, and decision. Divergence is the phase when the problem is freely explored and alternative points of view gathered. Convergence focuses on clarification and a narrowing of opinion. Decision is when solutions are agreed upon to a set of questions. Since decisions are often revisited, capturing the range of viewpoints and rationales allows decisions to be revisited without starting from the beginning. To support convergence and decision, QuestMap supports a form of voting called endorsement, and allows Ideas to be retired.
Annotation Systems allow third parties to add information such as comments, ratings, or links to on-line content owned by others. ComMentor is an annotation system from the Stanford Digital Library which uses dynamic web page synthesis to construct virtual documents [1]. In-line comments can be attached to any web document, using "string position trees" to maintain their document position even when the document changes. Hypertext links can be added for help in navigating: landmarks are reference points, tours are lists of links associated with a reference page, and trails provide multiple tours through related material. Dynamic page generation allows many user choices without creating confusion. Seals of Approval (SOAPS) are annotations which provide third-party ratings of web pages, used to filter or recommend web content.
Recommender Systems. Recommender systems make recommendations of web resources, possibly tailored to the requesters preferences, using implicit (mentioning of a URL, time spent by a user on a web page) or explicit user ratings. Fab combines content-based (find items similar to those a given user has liked in the past) and collaborative (find items liked by other users who are similar to this user) recommendations [3]. Users ask for recommendations in a topic area, are shown web pages and then asked to assign ratings on a scale of 1-7. Weights are assigned to the keywords from these pages, and this "relevance feedback" is incorporated into the users profile. Fab can then tailor future recommendations to the users evolving profile, using comparisons of the content of new pages to the users profile, and by computing distances between users and then using pages that were liked by close neighbors. Since the web page collection process is shared by all the users, the work is proportional to the number of topics rather than the number of users. The success of the system depends "on the ability of the collection agents to specialize and learn profiles which do indeed represent areas where users interests overlap".
PHOAKS (People Helping One Another Know Stuff) attempts to automatically recognize recommendations of Web resources (URLs) contained in Usenet, by analyzing the text of Usenet postings [7]. Usenet messages mention URLs 23% of the time; the designers claim to be able to extract which of those are recommendations with 88% precision and 87% recall. The system then provides Web pages listing the ranked recommendations from each newsgroup. The system recognizes role specialization in Usenet: "only a minority of people expend the effort of judging information and volunteering their opinions to others". They focus on (human generated) FAQs to validate their analysis and enhance the recommendations. For the future they plan on "exploring the issues of how to compute the credibility of recommenders and affinity between those who offer and those who seek recommendations".
GroupLens, now a project at the University of Minnesota, attempts to provide collaborative recommendations of Usenet articles [4]. It provides modified newsgroup clients that allow readers to rate articles and receive predictions based on matching similar users. "Users generally consider only 5% to 30% of articles in typical newsgroups to be desirable". Along with the high volume of news, this implies that the value of correctly filtering uninteresting articles is high. The large number of Usenet readers implies that the value of interesting articles is also high. There is a need for article ratings almost immediately upon posting: a one-day delay means no prediction for about half the users. A major problem is with sparse ratings, since "efficiently reading high volume groups requires being highly selective", many articles are not rated at all. Scaling issues can be handled by partitioning the servers by newsgroup, and by clustering similar users. Also, "composite users" can be defined, which reduces the computational burden of examining lots of other users looking for a match. In the future the group will examine implicit ratings and compensation systems.
DESIGN REQUIREMENTS
The goal is to envision applications that will support the capturing and structuring of knowledge from a widely distributed and likely heterogeneous group of users. Both the benefits and costs of building a successful system are likely very high. These are the assumptions about the needs and purposes of such an application:
The requirements of such a collaborative system are that:
One of the critical pieces of creating software is getting people to use it, known in commercial contexts as marketing. A collaboration system requires a critical mass of users before it can even start to function correctly; therefore I propose to take Usenet as the starting point for design, and position the system as an extension of Usenet culture, infrastructure, readership, and purpose. To emphasize this as the marketing plan, I will provisionally name the system CUsenet for "Collaborative Usenet". I do not mean to imply, however, any implementation strategy such as using NNTP (network news transport protocol).
ROLES
An important way of specifying what functions the system should provide is to create roles that a user may be willing to assume. A user role is a coherent delineation of responsibility, focus, motivation, and integrated system support for an individual user. Role specialization allows supporting tools to efficiently divide the information "into meaningful and coherent chunks that match cognitive capacity and motivational level of participants." [6]. Here I briefly describe each role, and below I propose some designs to support those roles:
Filterer: Read raw postings and remove inappropriate messages, and make certain kinds of message classifications such as question, rating, etc. on messages that are misclassified.
Rater: Evaluate the content of articles on two separate scales: 1) usefulness of the article, and 2) personal agreement with the article.
Contributor: Respond to other articles, add new ideas, comment, annotate, suggest links to other resources. This is what a current Usenet contributor does now.
Discussion Organizer (doer): Structure a discussion by filtering and editing, summarize arguments, add semantic links, highlight important contributions and answers, delineate alternatives, annotate, and generally create structure and organization out of a discussion thread.
Topic Organizer: Similar to a doer, but works at a higher level. Organizes and summarizes related discussions and results for an entire topic.
Site Organizer: Creates a high level view of the entire set of useful results for this newsgroup.
Index Librarian. Add indexing tags to messages to allow search capabilities. Propose tags to be added to a standard, domain-specific set. Propose items to be added to a glossary.
FAQ Librarian. Create "frequently asked questions", and summarize answers. Provide "see-also" links to more detailed and related answers, or to alternative answers or viewpoints. Generally, maintain and enhance FAQs.
Moderator. In Usenet, a moderator is someone who filters noise from a newsgroup, and often responds to messages that are off-topic or otherwise violate Usenet norms. Any rater can play the role of a moderator if the system allows a user to filter messages based on a specific raters evaluations, rather than on a collaborative scoring of messages.
Sysadmin. A system administrator for a cnewsgroup is needed to perform various administrative and monitoring functions. While this role in Usenet is normally played by a technical person who administers the hardware and software, in theory it could be a non-technical person who has some named rights and responsibilities for the system.
In general, a user may play more than one role, perhaps simultaneously, since many of these roles overlap and their boundaries are not always sharp. For example, filter and rate are obvious tasks to combine, and the system should allow a user to efficiently do both at once. The distinction between the roles of discussion organizer and topic organizer may be mostly useful in the abstract, since discussions often wander all around a topic area.
A very important feature is that roles can be played by more than one, even many, users. This makes the question of ownership and editing rights critical. It is a requirement that multiple viewpoints are supported, so it is possible to have multiple discussion or topic summaries, or even site maps or FAQs. The goal is to capture the diversity of opinion and to allow a user to see the information from various viewpoints, through various explicit and implicit filters. While the focus here is on the collaborative sharing of roles, it is possible that some groups may decide to restrict some of these roles to named individuals.
The set of people willing to assume a specific role is an important subgroup, since it allows certain types of messages to be dealt with only by that group. For example, a discussion on whether a question should be added or deleted from the FAQ could be directed at the FAQ Librarian subgroup.
DEFINITIONS AND ONTOLOGY
A cnewsgroup ("collaborative newsgroups") is a CUsenet group with a specific subject matter (as in Usenet), which may be open to all or be restricted to registered users only. A cnewsgroup has a charter, which states the subject matter and the scope of the group, whether open or closed to readers and writers, as well as any ground rules, restrictions, decisions, or other agreements expected to be followed.
A message is the basic unit of communication to a cnewsgroup. A message has a unique message type and any number of attributes specified as keyword-value pairs. A message stream is a set of messages that are passed to a user. The raw stream is all the messages posted to a cnewsgroup. The filtered stream is all messages that pass the collaborative filter algorithm, which removes spam and other inappropriate articles. A substream is a subset of the filtered stream, consisting of only certain types of messages, for example only ratings, or only FAQs. A rated stream is a set of messages passing some rating criterion, such as those with a usefulness score greater than zero.
Whereas a message has a limited lifetime, similar to Usenet articles, a document is intended to be a persistent object, similar to web pages. Messages that are referenced from a document are given the same persistence as the document. Again following the Usenet and Web analogies, messages once posted cannot be changed (although they can be canceled), while documents may be changed at will.
A discussion thread is all the messages with the same discussion attribute. This is separate from the In-regards-to attribute that means that this messages is in response to another specific message. A topic is a general area of interest to the cnewsgroup, created by sending a topic-create message containing the topic name and a description of the question or scope of the topic. Each cnewsgroup has a "general" topic and each topic has a "general" discussion which are the default attributes for a message. Every message is therefore always contained in a hierarchy cnewsgroup Þ topic Þ discussion Þ message.
A user is anyone who reads or writes to a cnewsgroup. A participant is a human person who has registered with some authority, whose messages are authenticated, and who may write documents. A contributor is a human person who has not yet registered, or who chooses not to register; a contributer can write messages but not documents. A contributor is not necessarily anonymous, since they might sign their messages with a real address. A reader is a user who has read-only permission for a group. An agent is a non-human reader, including corporations and software programs. A cnewsgroup that disallows contributors is a private-write cnewsgroup, and if it disallows readers, it is a private-read, or simply private cnewsgroup.
A rating is a numerical score assigned to a message or document by a user. Documents can always be rated, but not all messages can be rated, e.g. a message of type rating cannot itself be rated. Anything that can be rated is called a ratable. The usefulness rating gives the users opinion on how useful a ratable is, while the agreement rating specifies how much a user agrees or disagrees with it. Ratings have the same persistence as the ratable they are attached to, and can be changed as long as the ratable exist.
A score is a numerical value assigned to a ratable by a scoring algorithm, always a function of the ratables usefulness ratings, not agreement ratings. A distance is a numerical value assigned to a pair of ratables by a distance algorithm, always a function of the ratables agreement ratings. Distances can also be assigned among pairs of ratables, users and viewpoints. A ratable may have multiple scores and distances based on different algorithms.
A users weight is a numerical value assigned to a user by a weighting algorithm, typically a function of the scores of the ratables owned by the user. A user may have multiple weights based on different weighting algorithms. The system tracks the weight only of participants, that is, registered and authenticated users. A viewpoint is a composite user generated by a viewpoint algorithm, for the purpose of analyzing user distances and clustering behaviors.
DESIGN FOR USER ROLES AND TASKS
One of the benefits of creating well-defined roles is to identify the goals of a user when s/he is in that role. Understanding those goals is key to creating a good system design. Here I examine each role and suggest specific system functions that are required, with an emphasis on the efficient use of the users time.
Filterer: A filterer needs to view the raw message stream sequentially and to give each a pass/fail rating. It may be feasible to provide text analysis algorithms that could automatically detect inappropriate messages and present them to a filterer for fast confirmation. A filterer also checks message types, and makes corrections if needed, so support for private email to the poster who misclassified the message would be helpful in educating new users.
Rater: A rater needs to view message streams and give individual messages numerical ratings of usefulness and/or agreement. When the filter and rate roles are combined, the user operates on the raw stream; when separated, the rater operates on filtered streams. A more complex function would be to allow the rater to divide a message into sections, and separately rate each piece. There are a number of ways to design this. One is to treat the separated sections as new messages, with a link to the original article for context clarification. The other is to treat the separate ratings as annotations on the original article. This function starts to blur the rater role with the doer role.
Contributor: A contributor needs to write messages that are correctly typed, with message attributes (such as group, topic, discussion, In-regards-to) inferred by the system, with optional override. Creating annotations and links to other articles should be as simple as drag-and-drop.
Discussion Organizer: A doer needs to cut and paste parts of messages into a discussion summary document, inline an entire message by reference, or create a typed hyperlink to another message or document. The component messages should be displayed in a way to indicate that they are quotes from other documents, and the system should allow a user to view the original owners name and a link to the original message. The summary document may be monolithic, or may be a collection of linked documents. It should also consist of nested sections, so that a user can automatically see an outline view of the document. The semantic links among the sections of the document should allow an automatic generation of a graphical representation of the document called a graphical view, or node-and-link view, similar to gIBIS. The doer may edit any of the views, and the other views are automatically changed, i.e. there is only one document.
Topic Organizer and Site Organizer: These roles require the same functions as a discussion organizer, but generally do not cut and paste or inline documents, although they can add annotation and comment, and general orientation remarks. The topic organizer creates a topic summary document using references to discussions and adds comments and semantic links between the discussions. A topic organizer is free to include or exclude discussions, and may annotate discussions, but may not modify a discussion. The graphical view of the topic is automatically generated based on the discussions themselves and the links between them. A site organizer creates a site map document which is a summary of the entire cnewsgroup, using topic summaries and adding links and comments.
Indexer. An indexer creates annotations of a message or document by specifying a word or phrase that indexes its content. These annotations have message type index, and are sent to the index message substream, normally not viewed by readers. An index may refer to a section, or range of a message or document. An indexer should be able to quickly specify one or more index words for a message or document, optionally mark the section that it applies to, and have the system generate the message(s).
Index searcher: An index searcher posts messages of type index-search, consisting of a phrase to be searched for in the index. These messages are automatically answered by the system via a web interface or private email to the index searcher.
Index Librarian: An Index Librarian performs the task of systematically adding indices to a message stream. Since index messages can be sent by anyone, the Index Librarian may also need to filter or rate index messages. Indices are used to build a cnewsgroup search index document, so there might be tools to assist Librarians to monitor the quality of the search index. There might be a domain specific standard set of indices that the Index Librarian maintains; these might be sorted by frequency of use to allow quick generation of indices with minimal typing (e.g. with keyboard accelerators). There might also be a glossary of terms that the Index Librarian maintains.
FAQ contributor: A frequently asked question is a message of type FAQ, consisting of a question, an answer or a link to an answer, and optional comments or other links of type see-also. Any contributor may post a FAQ, and these messages are sent to the FAQ message substream, normally not viewed by readers. The system stores FAQs for automatic searching in the collaborative FAQ document.
FAQ searcher: A FAQ searcher posts messages of type FAQ-search consisting of a question which is automatically compared against existing FAQs, and the best answers are returned to the user by private email or Web interface. Optionally a message of type FAQ-answer is generated, for use by FAQ Librarians. If the answer is unsatisfactory, a FAQ searcher may generate a message of type FAQ-request which is a request for human help to answer a question.
FAQ Librarian: A FAQ Librarian may filter or rate the FAQ message substream, and may monitor the FAQ-answer and/or FAQ-request substreams in order to improve the automatic FAQ responses. They can annotate FAQs by adding alternate question formulations for the same answer, add see-also links to alternative answers and add for-more-detail links to more detailed answers or discussions. The system should generate periodic statistics of FAQ usage, hits, requests, etc.
FAQ Organizer: A FAQ Organizer uses the same tools as discussion and topic organizers to generate a FAQ summary document, using similar structuring and linking. This is separate from the collaborative FAQ document itself, which is owned and maintained by the system: a FAQ summary document is maintained and owned by an individual. FAQ searches can be generated against one or all FAQ summary documents, in addition to the collaborative FAQ itself.
Reader: A reader can read any of the messages or documents, and post messages of type Index-search, FAQ-search or FAQ-request.
DESIGN FOR COLLABORATIVE FEATURES
Many of the design features described so far are straightforward for single user document editing or for small-group settings, but are less clear how they will work in a large-group Usenet-like environment. For example, what does it mean to have multiple FAQ librarians? Who gets to decide what answer gets generated to a question? This section outlines some of the features necessary to deal with these issues.
Co-ops. We have already distinguished several classes of users: readers with read-only privileges, contributors with message writing privileges, and participants who are registered and authenticated, and can create persistent documents. Now we define a co-operating group, or co-op as a named group of participants in a relationship of mutual trust. All members have equal rights within the co-op, and all messages and documents from the co-op are assumed to be written by consensus. Messages and documents are owned by the co-op, but all members have the write privileges to them. A co-op may give itself a pseudonym, but membership information is publicly and easily available. Formation and changes of membership requires authentication from all members. A co-op is dissolved when any member requests to do so. Membership in a co-op does not preclude or affect participation as an individual (except for scoring, see below).
Ownership. Every message or document is owned by the user or co-op who created it. Only owners have write privileges for a message or document. Ownership of a document may be transferred to another participant.
Document Ownership. Discussion structure a discussion into an intelligible form, clarifying and focusing the arguments and information. Even with the intention of neutrality, it is impossible in general to structure a controversial discussion in an unbiased way. The question of what is relevant to a discussion is a form of bias. Therefore the ownership and control of the discussion is made explicit; a doer is encouraged to state their point of view as a standard attribute of the discussion. Any participant can create an alternative discussion on the same question. Since discussions are rated by participants, readers can filter what discussions they wish to view or participate in. These comments apply equally to Topic, Site, and FAQ Organizers.
Scoring. Scoring is a way of filtering and ranking the usefulness of information. Scoring does not directly measure agreement or closeness of values of participants, although usefulness ratings will be affected by agreement in many cases. The system tracks only weights for participants, but not of contributors. It might be the choice of a newsgroup to disallow contributors, to give them a weight of zero, or give them a weight of one, and use that as a reference value when calculating participant weight.
Message scores and user weights depend on the scoring and weighting algorithms used, and so there may be multiple scores and weights if multiple algorithms are available. At a minimum, the system always provides the default consensus algorithms, whose intent is to reflect the authority and contributions of participants over the life of the group, using some measure of "majority opinion", and the accumulated weights of participants. Another possible algorithm might calculate scores by giving participant X a weight of 1, and weighting other ratings by their owners distance from X. This algorithm might be called scoring from Xs viewpoint.
There is another paradigm from which to design CUsenet: we can think of it as a wide-area, multi-user game. This reflects the centrality of the rating of the articles. Users "score points" by posting articles that are positively rated. Their cumulative score is reflected in their "weight". Exposing this facet of collaborative work is likely to have interesting consequences. Can the game rules be designed to encourage participation and so that the results are useful collaborative summaries of knowledge? There may be valuable insights from the "dungeon and dragons" type adult fantasy games. These games have cumulative scoring (possibly accumulating over many years), create many roles that users can take on, and are accessible to novices even while the rules become very complex for experienced players.
Viewpoints: The goal of viewpoints is to allow minority opinions to be found. There are three possible aspects to this. The first is to calculate scores based on an individuals ratings, as discussed above. The second is to identify clusters of users automatically. The third is to allow users to form self-identified clusters by allowing them to claim allegiance with a named viewpoint. The idea is to let users see representations of a site/topic/discussion from any of these viewpoints. A user who finds him/herself agreeing with a viewpoint would add weight to the viewpoint. In this way a minority or even an individual viewpoint could attract attention and support.
Voting. A question can be put to a vote by framing it in a yes/no or numeric range, and specifying the closing date to vote by. Any participant can vote, votes are authenticated, and a participant can vote only once. Results are displayed both weighted and unweighted, and are always public, that is cannot be withdrawn from public inspection. Votes are by default public, so that tallies of the participants votes (and their weights) are publicly available. Votes may also be declared anonymous, so that a participants vote is not public. The form for voting has a comment field that a voter can use to annotate his/her vote.
MESSAGE AND TYPE LINKS
The classification of messages is critical to the structuring of information. Standard message types should be very easy for users to understand and use to classify their own messages. These might include:
Question for Discussion: Pose a question that you would like to discuss.
FYI ("For your Information"): An (alleged) fact or piece of information.
Response: a response or comment on another specific message or thread. In-line annotations are possible using this message type.
Summary: an attempt to summarize a discussion, question, or issue.
Graph: Node and link graphs may be attached to a discussion, a topic, or the entire site.
In addition, we have already identified a number of specialized messages, that are filtered out of the general message stream such as topic-create, FAQ, FAQ-search, FAQ-answer, FAQ-request, index, index-search, rating, and vote.
Standard link types that should be understood easily by users include: Re (in reference to) and See Also (additional information at). Other semantics that might be used by the original authors, or by discussion organizers might include Agrees, Conflicts, Qualifies, Summarizes, and Is an Answer to. Many others types are possible. It will likely be important to allow groups and even individual users to define their own link types.
SECURITY
In a controversial cnewsgroup where there may be fear of vote or algorithm tampering or other illegal behavior, the role of the sysadmin must be perceived as neutral, and his/her actions even be monitored and reviewable. This role might be shared by a named group of individuals whose decisions must be explicitly ratified by all members. We might call this a check group or checkpoint, and note it assumes a relationship of mistrust, whereas a co-op assumes a relationship of trust.
It is important that each participant have only one rating and one vote, although participants obviously accrue different weights. Therefore, participants may not assign their authentication to other users for any reason, so a single human person cannot be more than one participant. An agent is a non-human user, with read-only rights. In the future, this restriction might be relaxed under suitable safeguards. It is not yet clear how to prevent participants from registering more than once, or agents from registering at all.
Given the requirement to register and authenticate ratings, it seems likely that scores and user weights can remain free from outside attack. The possibility of saboteurs joining a cnewsgroup to disrupt it, for example by introducing random noise in the ratings, is harder to defend against. A group of saboteurs might be called a cabal; a large enough cabal attacking the filtering or rating substream could be very disruptive. Tools will have to evolve to deal with these situations. A cluster analysis should reveal the participants, and an accusation of sabotage might result in triggering an optional filter that users could enable to remove any messages owned the cabal.
IMPLEMENTATION
Object Implementation.
The likely choice for document encoding is with Extensible Markup Language (XML), since this includes all of the functionality of HTML, and has the extensibility required. It seems likely that messages should also be XML documents.
Bi-directional typed links appear adequate to express the various relationships among messages [6], as well as provide the hyperlink navigational anchor. A link then consists of two addresses and a type. From the system level, the link type is simply a short display phrase, an explanatory help paragraph, and a set of rules or grammar specifying what types of messages the link can point to
It should be possible for users themselves to define new attribute tags and link types, with the application automatically incorporating them. Where the semantics of a new type require the application to process or display them differently will obviously require new coding. In an ideal world, user contributed applets (properly authenticated, perhaps distributed only as source code) could add the new functionality.
Client Design
I assume that the look and feel and functionality of Web browsers and news readers is the starting point for the user interface. Whether it is a plug-in/add-on to an existing browser or a separate application is unclear, though I suspect that only a separate application can give the fully integrated functionality necessary to insure optimum efficiency of a users time. More broadly speaking, subsets of the functionality can be made accessible from many platforms.
From the users point of view, I theorize that the client manipulates only four different types of objects: message, document, link, and graph. Messages and documents have structured attributes that represent their semantics. Typed links represent the relationships between messages and documents. A graph is a graphical representation of the messages and links between them. Messages and documents may actually differ only in their persistence, and can be composed of any of the objects that a web browser can deal with: text, images, multimedia, etc.
Server Design
GroupLens estimates that for their application, a single workstation can serve the needs of 10,000 users and 10-20 newsgroups. Usenet, on the other hand, has data volume much higher, and so replicates data to thousands of servers. It is likely that a single web server can meet the needs of a single cnewsgroup, at least initially, which would simplify greatly the server design. I theorize that client pull or server push of new messages at a user settable period will be adequate; this is an asynchronous application, after all. However, scaling to large numbers of users or data volume may require a distributed or replicated server design. It will be important to study what limits scalability, and how the user can be shielded from perceived latency.
Collaborative filtering implies that messages go to filterers before being distributed to others. The question is then how long the system should wait for filterers to respond before sending the message on to other users. A possible solution is to allow each client to set their own fetch policy, for example the options might be: 1) immediate push (for filterers), 2) wait fixed time and apply collaborative filter, or 3) wait until algorithm determines some probability of message being good (likely a function of number and agreement of filtering messages).
User Role Support
The filtering of messages based on message type is a simple and effective way of supporting role specialization. While each user role might have a default setting of message types to be filtered, the user should be allowed to override those defaults at the granularity of message type. This allows users to choose their own mixture of roles. The user might also want to separate the roles s/he plays at different times, and so the system should support configuring any number of roles, and allow easy switching between them.
Scoring
The user-visible rules of scoring and viewpoints should have the nature of the rules of a board game. The underlying algorithms might be genetic or otherwise mutable. The scoring system should reward cooperation, without suppressing dissent. Alternative scoring algorithms might be user-contributed, with safeguards like authenticated source code distribution only.
Synthetic datasets should be developed against which algorithms can be tested. One possible method is to define a synthetic set of messages as "true", then test algorithms using a stream of ratings with known noise. Another is to create a "true" set of user weights and viewpoint distances, then simulate those users sending messages to each other with varying degrees of ratings sparsity, noise, accuracy, etc. An algorithm can then be characterized by its time evolution towards "truth". Its possible that there are results from game theory that would be useful starting points.
CONCLUSIONS AND FUTURE WORK
I have presented motivation and design goals for a Web-based collaboration application that significantly extends the functionality of Usenet newsgroups to allow collaborative knowledge capture. The key assumptions are that only the human mind can create this knowledge, that a critical number of people in certain domains will be motivated to invest their time to do so, and that the Internet and supporting software such as proposed here are enabling technologies.
The first group to use such a tool might be one established to collaboratively design the tool and the protocols needed to implement the tool. That group would learn what works in a collaboration tool by using the tool themselves. By designing a framework from the start to allow rapid prototyping, software components, version control, and source code contributions (possibly from co-ops?), it is possible that the group could rapidly explore design alternatives. In any case, the groups real goal would be to develop communication protocols, message/document formats, and other standards that would enable such an application to be built. Implementations eventually would be done by other groups, possibly by commercial companies who want to include some or all of the functionality in their web or groupware application.
In order to be successful, a critical mass of users must be attracted to use such a system. I theorize that the users of Usenet technology newsgroups should be the target users for a beta version of the application. These users are more focused, highly motivated, and tool literate than average newsgroup readers, and would be tolerant of new and untested systems, sympathetic to efforts to improve the quality of newsgroups, and appreciative of the results of consensual and persistent knowledge bases. Any user will only be motivated to use an application whose value is higher than the cost of learning and using it. The application must be perceived as making efficient use of a users time, give positive feedback to those willing to contribute, and produce a result that is valuable. The initial design of the system should focus on this group of users and their needs.
This paper has lumped together features intended to create a "better Usenet" with the more difficult requirements of "wicked problems" and their social conflicts and likely irreconcilable values. My intent has been to emphasize the similarities of these two types of problems, or at least suggest the existence of a continuum between problems that admit to definition and solution, and those that resist consensual formulation. As a design and implementation strategy, however, it might prove useful to delay the complex features needed for wide-area structured discourse until some experience is gathered in deploying a tool for capturing less controversial, possibly technical, knowledge that is undergoing rapid change.
The clearest research need is to develop scoring, weighting and viewpoint algorithms. Synthetic datasets and ways to characterize algorithmic properties need to be developed. Methods of efficiently visualizing information using multidimensional metrics are needed in order to support multiple viewpoints. Authentication of participants as unique and human is also needed, as are other security concerns about deliberate sabotage or rating skewing. Architecturally, the biggest question is how to scale the system to large groups.
Hurwitz and Mallory concluded their work on the Open Meeting collaboration tool by noting that "the World Wide Web offers unprecedented opportunities for wide-area collaboration at a time when nothing less seems likely to cope with endemic and emergent global problems. We have argued that collaboration systems can begin to manage the complexity by supporting the specialization and localization of knowledge, planning and evaluation."
The development of the Linux OS not only demonstrates that the Internet has enabled successful wide area collaboration, but also gives us a blueprint for how to create the software necessary to extend collaborative methods to previously intractable "wicked" problems. The successful development of these collaborative tools may significantly contribute both to the sum total and intelligibility of human knowledge.
REFERENCES
[1] Martin Röscheisen, Terry Winograd, Andreas Paepcke, Content Ratings and Other Third-Party Value-Added Information: Defining an Enabling Platform, D-Lib Magazine, August 1995,
[2] Conklin, J. & Begeman, M. L. (1988). gIBIS: A Hypertext Tool for Exploratory Policy Discussion. ACM Transactions on Office Information Systems, 6,4, 303-331.
[3] Marko Balabanovic and Yoav Shoham, Fab: Contenet-Based, Collaborative Recommendations, Communications of the ACM, March 1997, Vol 40, No 3.
[4] Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., and Riedl, J. GroupLens: Applying Collaborative Filtering to Usenet News. Communications of the ACM 40,3 (1997), 77-87
[5] Paul Hawken, The Ecology of Commerce: A Declaration of Sustainability, HarperBusiness, NY, NY 1993.
[6] Roger Hurwitz and John C. Mallory The Open Meeting: A Web-Based System for Conferencing and Collaboration , Proceedings of The Fourth International Conference on The World-Wide Web, Boston: MIT, December 12, 1995.
[7] Loren Turveen, Will Hill, Brian Amento, David McDonald, and Josh Creter, Phoaks: A System for Sharing Recommendations, Communications of the ACM, March 1997, Vol 40, No 3.
[8] Rittel, H. W. J. & Webber, M. M. (1973). Dilemmas in a General Theory of Planning. Policy Sciences, 4, 155-169
[9] Buckingham Shum, S. (1997) Representing Hard-to-Formalise, Contextualised, Multidisciplinary, Organisational Knowledge. AIKM'97: AAAI Spring Symposium on Artificial Intelligence in Knowledge Management (Mar. 24-26, 1997), Stanford University, Palo Alto, CA (AAAI Press).
| Contact Us Site Map Search Terms and Conditions Privacy Policy Participation Policy | ||||||
|
||||||