February 11, 2004
book: computers and cognition
It seems we've all been getting phenomenological these days, so now is no time to stop. I just finished reading one of the better things to come out of the 1980's (my little brother and Metallica being two other notable exports) -- Winograd and Flores' monograph "Understanding Computers and Cognition".
The book is the retelling of an intellectual journey, philosophically examining the failure of Artificial Intelligence to achieve its lofty goals and directing the insights gained from this exploration towards a new approach to the design of computer systems. Or, more simply, how Heidegger and friends led an AI researcher to the study of human-computer interaction.
The authors begin by challenging what they call the "rationalistic" tradition (what today might be referred to as positivism?) stretching throughout most of Western thought. This tradition's problem solving approach consists of identifying relevant objects and their properties, and then finding general rules that act upon these. The rules can then be applied logically to the situation of interest to determine desired conclusions. Under this tradition, the question of achieving true artificial intelligence on computers, while daunting, holds the glimmer of possibility.
Winograd and Flores instead argue for a phenomenological account of being. The authors pull from a variety of sources to make their claims, but rest primarily on Heidegger's Being and Time and the work of biologist Humberto Maturana. One of the important implications is the notion of a horizon, background, or pre-understanding, making it impossible to completely escape our own prejudices or interpretations. Much of our existence is ready-to-hand, operating beneath the level of recognition and subject-object distinction, and this can not, in its entirety, be brought into conscious apprehension (i.e. made present-at-hand). AI programs at the time, however, were largely representational. The program's "background" is merely the encoding of the programmer's apprehension and assumptions of the program's domain. While this approach can certainly create useful programs, they are characteristic of the decontextualized, desituationalized nature commonly attributed to computer interaction and are a far cry from human intelligence.
The authors further delve into the issue of language, arguing that "...the essence of language as a human activity lies not in its ability to reflect the world, but in its characteristic of creating commitment. When we say a person understands something, we imply that he or she has entered into the commitment implied by that understanding." Thus, the authors argue that computers, by their very nature, are incapable of commitment and therefore prevented from entering into language on the same terms as humans.
The authors' conclusion? Move from AI to HCI. There is an error in assuming that success will follow the path of artificial intelligence. The key to design lies in understanding the readiness-to-hand of the tools being built, and in anticipating the breakdowns that will occur in their use. A system that provides a limited imitation of human facilities will intrude with apparently irregular and incomprehensible breakdowns. On the other hand, we can create tools that are designed to make the maximal use of human perception and understanding without projecting human capacities onto the computer.
Other thoughts and notes are in the extended entry.
The design section at the end of the book discusses the Coordinator system, which explicitly represents different speech acts as a way of attempting better coordination of organizational communication, in particular supporting the formation and evaluation of commitments. I'm not familiar with the literature on this system, but colleagues of mine have referred to it as a known failure of early CSCW (computer-supported cooperative work). The explicit encoding of otherwise "ready-to-hand" communication seems potentially dangerous and limiting of social nuance. For example, if a commitment is encoded formally, how much room for ambiguity (or delaying, or weaseling, or whatever) is left without making it present-at-hand? It is similar to one of the projects discussed in my friend Scott's thesis, in which by trying to leverage a theory of human behavior (in this case Goffman's notion of different fronts or faces), he encoded formally what people practice unconsciously with high degrees of nuance, thus creating a disconnect between actual human behavior and the well-intentioned mechanisms of the interface.
How would more recent AI developments be treated through the lens of this book? Modern statistical techniques can incorporate probabilisitic logic and learning from example data, but still revolves around the statistical model (e.g. specific graphical models) and training techniques (e.g. the EM algorithm) used. These are still representational (primarily in the choice of statistical model), but less strictly so. How far can we extrapolate this, loosening the representation?
Do we have any of our own 'hard-coded' models (e.g. Chomskian grammar)? Where do our own representational structures lie on the spectrum of nature (genetics, evolution) and nurture (socially learned and negotiated meaning)?
The question here is at the heart of modern cognitive neuroscience - at what representational level, if any, can we understand human functioning, cognition, and experience (at varying levels of consciousness)? Physics? Chemistry? Neuronal interaction? At what level should we look for the organization (or perhaps better stated, embodiment) of a structure-determined, autopoietic system that allows for experience, intelligence and a background to arise? In short, where and how do science and phenomenology dovetail?
In the meantime, it is argued that the design of computer programs should steer clear of these pretensions. The lesson from above teaches us that even as we understand mechanisms of thought, language, experience, etc, the way we naturally perceive and act in the world is not experienced or conceptualized in the terms of these mechanisms.
The big challenge left for us after reading this book: How do we determine the readiness-to-hand of the tools being built (or the desired 'invisibility' of ubiquitous computing environments)? How do we design for it, how do we measure it, evaluate it, and value it? Furthermore, how do we look beyond just 'tools'? How do we build things that appropriately shift between ready-to-hand and present-at-hand, and that are designed to evoke emotional as well as rational responses? (e.g. a nuclear missile launch control interface should be anything BUT ready-to-hand, requiring conscious deliberation). We've had almost 20 years of HCI research since this book was published, with numerous successes in various (often constrained) domains, but these are still the core theoretical and methodological motivations pushing us forward.
Ready-to-hand: the world in which we are always acting unreflectively. The ready to hand is taken as part of the background, taken for granted without explicit recognition or identification.
Present-at-hand: the world in which we are consciously reflective, identifying, labeling, and recognizing artifacts and ideas as such.
Breakdown: the event of the ready-to-hand becoming present-at-hand
Throwness: the condition of understanding in which our actions find some resonance or effectiveness in the world
Properties of throwness
The Biology of Cognition: Humberto Maturana
Autopoiesis. An autopoietic system is defined as: "...a network of processes of production (transformation and destruction) of components that produces the components that: (i) through their interactions and transformations continuously regenerate the network of processes (tealtions) that produced them; and (ii) constitue it (the machine) as a concrete unity in the space in which they (the components) exist by specifying the topological domain of its realization as such a network..." -Maturana and Verla, Autopoiesis and Cognition (1980), p.79
A plastic, structure-determined system (i.e., one whose strucutre can change over time while its identity remains) that is autopoietic will by necessity evolve in such a way that its activities are properly coupled to its medium.
Structural coupling is the basis not only for changes in an individual during its lifetime (learning) but also for changes carried through reproduction (evolution). In fact, all structural change can be viewed as ontogenetic (occurring in the life of an individual). A genetic mutation is a structural change to the parent which has no direct effect on its state of autopoiesis until it plays a rolue in the development of an offspring.
A cognitive explanationis one that deals with the relevance of action to the maintenance of autopoiesis. It operates in a phenomenal domain (domain of phenomena) that is distinct from the domain of mechanistic structure-determined behavior.
For Maturana the cognitive domain is not simply a different (mental) level for providing a mechanistic description of the functioning of an organism. It is a domain for characterizing effective action through time. It is essentially temporal and historical.
The sources of pertrubation for an organism include other organisms of the same and different kinds. In the interaction between them, each organism undergoes a process of structural coupling due to the pertrubations generated by the others. This mutual process can lead to interlocked patterns of behavior that form a consensual domain.
Five categories of illocutionary point:
The failures of AI
...the essence of language as a human activity lies not in its ability to reflect the world, but in its characteristic of creating commitment. When we say a person understands something, we imply that he or she has entered into the commitment implied by that understanding. But how can a computer enter into a commitment?
October 27, 2003
book: philosophy of punk
First things first, let's just be clear that when we use the word philosophy here, we're not talking Kant (thank god), and we're not talking Wittgenstein... but we are talking about a possibly fascinating look at a largely misunderstood sub-culture... one with often conflicting views from it's own members. Unfortunately this book is not quite there.
In true punk spirit, however, O'Hara's book is a DIY (do-it-yourself) effort from the ground up. Fanzines (e.g., Maximum Rock n' Roll, Profane Existence, ...) and album liner notes form the primary sources of the book, which chronicles punk viewpoints on media misrepresentation, zines, anarchism, gender, sexuality, and environmentalism. For the most part little new is gleamed, though the book does a nice job of taking various snapshots of the (primarily early 90's) punk world. Skinheads (even the steadfastly anti-racist breed) and straight-edgers, however, are given particularly scathing treatment, as the author characteristically sways between a pseudo-objective tone and unrestrained vitriolic opinion. The same style, if you ask me, that so often characterizes punk.
I did appreciate the book's chapter on anarchism, as it was one of the few sections where I encountered some new perspectives, and set me on a path to discover some interesting readings such as this one. I also discovered that true punks, according to O'Hara's view, are utopians: "anarchy does not just mean no laws, it means no need for laws."
What really struck me, though, was how deeply the rhetoric of rebellion is woven into punk philosophy as presented. In seems that most punk causes can be formulated to always begin with the prefix "anti-". In so doing, it runs the risk of ever being a counter-culture, defined largely by resistance and therefore existing as a reactive movement, its identity dependent on the larger culture it lashes back against. As such, punk is limited, willingly or unwillingly, to merely modifying the culture it would like to see obliterated. This observation is an over generalization, of course: punk acts continue to promote more egalitarian financial models (e.g., the wonderful folks from Fugazi), and other trends in the sub-culture, particularly gender and environmental issues, tend to promote a more proactive outlook. If punk truly still exists in this day in age (it always seems to be pronounced dead or dying), it will be interesting to see how it further evolves.
In the end, I wouldn't recommend going out of your way to get ahold of this book. But if you're either interested or completely ignorant of punk, and like me, find this book sitting on a friend's bookshelf, pick it up and give it a read. At the very least, it will get you thinking. Or, if you want to expose yourself to one of the more beautiful (and for my young teenage self, life-changing) expressions of punk philosophy, buy this album and learn all the lyrics by heart.
September 09, 2003
paper: interface metaphors
Interface Metaphors and User Interface Design
This paper examines the use of metaphor as a device for framing and understanding user interface designs. It reviews operational, structural, and pragmatic views on metaphor and proposes a metaphor design methodology. In short the operational approach concerns the measurable behavioral effects of applying metaphor; structural analyses attempt to define, formalize, and abstract metaphors; and the pragmatic approach views metaphors in context – including the goals motivating metaphor use and the affective effects of metaphor. The proposed design methodology consists of 4 phases: identifying candidate metaphors, elaborating source and target domain matches, identifying metaphorical mismatches, and finally designing fixes for these mismatches. Strangely, this paper makes absolutely no mention whatsoever of George Lakoff’s influential work on conceptual metaphor, which I’m almost certain had been published prior to this article. My outlined notes follow below.
September 08, 2003
paper: hci and disabilities
Human Computer Interfaces for People with Disabilities
Human computer interface engineers should seriously consider the problems posed by people with disabilities, as this will lead to a more widespread understanding of the true nature and scope of human computer interface engineering in general.
September 06, 2003
paper: contextual inquiry
Contextual Design: Contextual Inquiry (Chapter 3)
This article discusses in depth the contextual inquiry phase of the contextual design methodology. Contextual inquiry emphasizes interacting directly with workers at their place of work within the constructs of a master/apprentice relationship model in order for designers to gain a real insight into the needs and work practices of their users.
paper: contextual design
Contextual Design: Introduction (Chapter 1)
This book chapter introduces the difficulties of customer centered design in organizations, and proposes the methodology of Contextual Design as a set of processes for overcoming these difficulties and achieving successful designs that benefit both the customer and the business.
paper: rapid ethnography
Rapid Ethnography: Time-Deepening Strategies for HCI Field
HCI has come to highly regard ethnographic research as a useful and powerful methodology for understanding the needs and work practices of a user population. However, full ethnographies are also notoriously time and resource heavy, making it hard to fit into a deadline-driven development cycle. This paper presents techniques for rapid, targeted ethnographic work, in the hopes of accruing much of the benefit of field work while still fitting within acceptable time bounds.
The paper organizes its suggestions around three core themes:
paper: 2D fitt's law
Extending Fitt’s Law to Two-Dimensional Tasks
This paper extends the famous Fitt’s Law for predicting human movement times to work accurately in two-dimensional scenarios, in particular rectangular targets. The main finding of the paper is that two models, one which models target width by projecting along the vector of approach and another which uses the minimum of the width or height achieved equal statistical fits, and showed a significant benefit over models which used (width+height), (width*height), and (width-as-horizontal-distance-only) models.
For those who don’t know, Fitt’s Law is an empirically validated law that describes the time it takes for a person to perform a physical movement, parameterized by the distance to the target and the size of the target. It’s formula is one-dimensional: it only considers movement along a straight line between the start and the target. The preferred formulation of the law is the Shannon formulation, so named because it mimics the underlying theorem from Information Theory --
MT = a + b log_2 (A/W + 1)
Where MT is the movement time, A is the target distance or amplitude, W is the target width, and a and b are constants empirically determined by linear regression. The log term is known as the Index of Difficulty (ID) of the task at hand and is in units of bits (note the typo in the paper).
The Shannon formulation is preferred for a number of reasons
This paper then considers two-dimensional cases. Clearly you can cast the movement along a one-dimensional line between start and the center of the target, and the amplitude is the Euclidean distance between these points. But what to use as the width term? Historically, the horizontal width was just used, but this seems like an unintuitive choice in a number of situations, particularly when approaching the target from directly above of below. This paper studies five possibilities: Using the minimum of the width and distance (“smaller-of”), using the projected width along the angle of approach (“w-prime”), using the sum of the dimensions (“w+h”), using the product of the dimensions (“w*h”), and using the historical horizontal width (“status quo”).
The study varied amplitude and target dimensions crossed with 3 approach angles (0, 45, and 90 degrees). Twelve subjects were used, who performed 1170 trials each over four days of experiments. The results found the following ordering among the models in terms of model fit: smaller-of > w-prime > w+h > w*h > status quo. Notably, the smaller-of and w-prime cases were quite close – their differences were not statistically significant.
The w-prime case is theoretically attractive, as it cleanly retains the one-dimensionality of the model. The smaller-of model is attractive in practice as it doesn’t depend on the angle of approach, and so require one less parameter than w-prime. The w-prime model. However, doesn’t require that the targets be rectangular as the smaller-of model assumes. Finally, it should be noted that these results may be slightly inaccurate in the case of big targets, as the target point is assumed to be in the center of the target object. In many cases users may click on the edge, decreasing the amplitude.
September 05, 2003
paper: other ways to program
Drawing on Napkins, Video-Game Animation, and Other Ways to Program Computers
This article describes a number of visual, interactive methods to programming. The main thesis is that visual programming environments have failed to date because they are not radical enough. Programs exhibit dynamic behavior that static visuals do not always convey appropriately and so dynamic visuals, or animation, should be applied. Furthermore, visual programming can avoid explicit abstraction (i.e. when visuals become just another stand in for symbols in a formal system) without necessarily sacrificing power and expressiveness. Put more abstractly, a programming language can be designed to use one of many possible syntactical structures. It then becomes the goal of the visual programming developer to find the appropriate syntax that can be mapped to the desired language semantics. To map an existing computer language (e.g., C or LISP) into a visual form would require the use of a visual syntax isomorphic to the underlying language. Doing so in a useful, intuitive, and learnable manner proves quite difficult.
Kahn describes a number of previous end-user programming systems. This includes AlgoBlocks, which allow users to chain together physical blocks representing some stage of computation. The blocks support parameterizations on them, and afford collaborative programming. Another system is Pictorial Janus, which uses visual topological properties such as containment, touching, and connection to (quite abstractly, in my view) depict program performance.
He goes on to describe a (quite imaginative) virtual programming "world", ToonTalk, which can be used to construct rich, multi-threaded applications using a video-game interaction style. The ToonTalk world maps houses to threads or processes, and the robots that can inhabit houses are the equivalent of methods. Method bodies are determined by actually showing the robot what to do. Data such as numbers and text are represented as number pads, text pads, or pictures that can be moved about, put into boxes (arrays or tuples), or operated upon with "tools" such as mathematical functions. Communication is represented using the metaphor of birds -- hand a bird a box, and they will take it to their nest at the other house, making it available for the robots of that abode to work with the data.
Kahn argues that while such an environment may be slower to use for the adept programmer, it is faster to learn, and usable even by young children. It also may be more amenable to disabled individuals. Furthermore, its interactive animated nature (you can see your program "playing out" in the ToonTalk world) aids error discovery and debugging. In conclusion, Kahn suggests that these techniques and others (e.g. speech interfaces) could be integrated into the current programming paradigm to create a richer, multimodal experience that plays off different media for constructing the appropriate aspects of software.
Inspiring, yes, but quite difficult to achieve. My biggest question of the moment is: what happened to Ken Kahn? The article footer says he used to work at PARC until 92, and then focused on developing ToonTalk full-time. I'll have to look him up on the Internet to discover how much more progress he made. While I'm skeptical of these techniques being perfected and adopted in production-level software engineering in the near future, I won't be surprised if they experience a renaissance in ubiquitous computing environments, in which everyday users attempt to configure and instruct their "smart" environs. If nothing else, VCRs could learn a thing or two...
paper: prog'ing by example
Tonight I read a block of papers on end-user programming, aka Programming by Example (PBE), aka Programming by Demonstration (PBD). Very fun stuff, and definitely got me thinking about the kind of toys I would want any future children of mine to be playing with.
Eager: Programming Repetitive Tasks By Example
This paper introduces Cypher's Eager, a programming by example system designed for automating routine tasks in the HyperCard environment. It works by monitoring users actions in HyperCard and searching for repetitive tasks. When one is discovered it presents an icon, and begins highlighting what it expects the user's next action to be - an interaction technique Cypher dubs "anticipation". This allows the user to interactively - and non-abstractly - understand the model the system is building of user action. When the user is confident that Eager understands the task being performed, the user can click on the Eager icon and let it automate the rest of the iteration. For example, it can recognize the action of copying and pasting each name in a rolodex application into a new file, and completely automate the task.
Eager was written in LISP, and communicated to HyperCard over interprocess communication. When a recognized pattern is executed, Eager actually constructed the corresponding HyperCard program (in the language HyperTalk) and passed it back to the HyperCard environment for execution.
There are a couple of crucial things that make Eager successful. One is that Eager tries only to perform simple repetitive tasks... there are no conditionals, no advanced control structures. This simplifies the both the generalization problem and the presentation of system state to the user. Second, Eager uses higher-level domain knowledge. Instead of low-level mouse data, Eager gets semantically useful data from the HyperCard environment, and furthermore has domain knowledge about HyperCard, allowing it to better match usage patterns. Finally, Eager has the appropriate pattern matching routines programmed in, including numbering and iteration conventions, days of the week, as well as non-strict matching requirements for lower-level events, allowing it to recognize higher-level patterns (ones with multiples avenues of accomplishment) more robustly.
The downside, as I see it, however, is that for such a scheme to generalize across applications you either have to (a) reprogram for every application or (b) designers must equip each program not only with the ability to report high-level events in a standardized fashion, but to communicate application semantics to the pattern matcher. Introducing more advanced applications with richer control structures muddies this further. That being said, such a feature could be invaluable in integrated, high-use applications such as Office or popular development environments. Integrating such a system into the help, tutorial, and mediation features already existant in such systems could be very useful indeed.
September 04, 2003
paper: charting ubicomp research
Charting Past, Present, and Future Research in Ubiquitous
This paper reviews ubiquitous computing research and suggests future directions. The authors present four dimensions of scale for characterizing ubicomp systems: device (inch, foot, yard), space (distribution of computation in physical space), people (critical mass acceptance), and time (availability of interaction). Historical work is presented and categorized under three interaction themes: natural interfaces (e.g. speech, handwriting, tangible UIs, vision), context-aware applications (e.g. implicit input of location, identity, activity), and automated capture and access (e.g. video, event logging, annotation). The authors then suggest a fourth, encompassing research theme of everyday computing, characterized by diffuse computational support of informal, everyday activities. This theme suggests a number of new pressing problems for research: continuously present computer interfaces, information presentation at varying levels of the periphery of human attention, bridging events between physical and virtual worlds, and modifying traditional HCI methods for informal, peripheral, and opportunistic behavior. Additional issues include how to evaluate ubicomp systems (for which the authors suggest CSCW-inspired, real-world deployment and long-term observation of use) and how to cope with the various social implications, both due to privacy and security and to behavior adaptation.
In addition to the useful synopsis and categorization of past work, I thought the real contribution of this paper was the numerous suggestions for future research, many of which are quite important and inspiring. I was also very happy to see that many of the lessons of CSCW, which are particularly relevant to ubicomp, were influencing the perspective of the authors.
However, on the critical side a couple things struck me. One is that many of the suggestions of research are lacking some kind of notion of how "deep" the research problem runs. For example, the research problems in capture and access basically summarize both the meta-data and retrieval problems, long-standing fundamental issues in the multimedia community. However, this depth and extent of the research issue, or how we might skirt the fundamental issues by domain-specificity, is not mentioned. Another issue I had was that I felt the everyday computing scenario might have used some fleshing out. I wanted the authors to provide me with the compelling scenario they say such research mandates. Examples were provided, so perhaps I am being overly critical, but I wanted a more concrete exposition, perhaps along the lines of Weiser's Sal scenario.
See the extended entry for a more thorough summary
September 03, 2003
paper: why and when five users aren't enough
Why and When Five Test Users aren’t Enough
This paper argues that Nielsen’s assertion that “Five Users are Enough” to determine 85% of usability problems does not always hold up. In the end, we walk away with the admonition that five users may or may not be enough. Richer statistical models are needed, as well good frequency and severity data. What does this mean for evaluators? Certainly this shouldn’t dissuade the use of usability evaluations, but it does imply that one should avoid false confidence and keep an eye to user/evaluator variability.
The paper starts by attacking the formula
ProblemsFound(i) = N ( 1 – ( 1 – lambda ) ^ i ),
in particular, the straightforward use of parameter (lambda = .31). Generalizing the formula shows we should actually expect, for n participants, that
ProblemsFound(n) = sum(j=1…N) ( 1 – ( 1 – lambda_j) ^ n ),
Where lambda_j is the probability of discovering usability problem j. Nielsen and Landauer’s formula assumes this probability is equal for all such problems (computed as lambda = the average of such empirically observed probabilities).
However, other studies, such as that by Spool and Schroeder, have found an average lambda of 0.081, showing that a study with ecologically valid tasks (in this case an unconstrained online shopping task with high N) can still miss many usability issues. Thus Nielsen’s claim that five is enough is only true under certain assumptions of problem discovery.
But other issues also abound. For instance, Nielsen’s model doesn’t take into account the variance between users, which can strongly affect the number of users needed. Further complications abound when considering severity ratings, as the authors found huge shifts in severity ratings based on different selections of five users. Other problems include which tasks are used for the evaluation (changes of task revealed undiscovered usability issues) and issues with usability issue extraction, determining the true value of N.
paper: heuristic evaluation
This paper describes the famous (in HCI circles) technique of Heuristic Evaluation, a discount usability method for evaluating user interface designs. HEs are conducted by having an evaluator walk through the interface, identifying and labeling usability problems with respect to a list of heuristics (listed below). It is usually recommended that multiple passes be made through the interface, so that evaluators can get a larger, contextual view of the interface, and then focus on the nitty-gritty details.
Revised Set of Usability Heuristics
The evaluators also go through a round of assigning severity ratings to all discovered usability problems, allowing designers to prioritize fixes. The severity is a mixture of frequency, impact, and persistence of an identified problem, and as presented forms a spectrum from 0-4, where 0 = Not a usability problem, 1 = Cosmetic problem only, 2 = Minor problem, 3 = Major problem, 4 = Usability catastrophe. Nielsen performs an analysis to show that inter-evaluator ratings have better-than-random agreement, and so ratings can be aggregated to get reliable estimates of severity.
Heuristic evaluation is cheap and can be done by user interface experts (i.e., they can be performed without bringing in outside users). Best results are experienced by evaluators that are familiar both with usability testing and the application domain of the evaluated interfaces. HE is faster and less costly than typical user studies, with which it can be used in conjunction (i.e. use HE first to filter out problems, then run a real user study to find remaining deeper seated issues). Lacking real user input, however, HE can run the risk of missing, or misestimating, usability infractions.
Nielsen found over multiple studies that the typical evaluator found only 31 percent (lambda = .31) of known usability problems in an interface. Using the model that
ProblemsFound(i) = N ( 1 – ( 1 – lambda ) ^ i ),
Where i is the number of evaluators and N is the total number of problems, we can arrive at the conclusion that 5 evaluators are enough to find 84% of usability problems. Nielsen also performs a cost-benefit analysis that finds 4 as the optimal number. Read the summary of the Woolrych and Cockton paper for a dissenting opinion.
paper: your place or mine?
Finishing off my block of CSCW papers is Dourish, Belotti, et al's article on the long-term use and design of media spaces.
Your Place or Mine? Learning from Long-Term Use of Audio-Video Communication
This article reviews over 3 years of experience using an open audio-video link between the authors' offices to explore media spaces and remote interaction. The paper details the evolution of new behaviors in response to the communication medium, both at the individual and social levels. For example, the users learned to stare at the camera to initiate eye contact, but later learned to do without this but still establish attention. Also, colleagues would come to an office to speak to the remote participant.
I saw some important take home lessons here:
My full outlined summary follows...
September 02, 2003
paper: computers, networks, and work
Computers, Networks, and Work
This article describes the early adoption of networked communication (e.g. e-mail) into the workplace. The often surprising social implications of networking began with the ARPANET, precursor of the modern internet. E-mail was originally considered a minor additional feature, but rapidly became the most popular feature of the network. We see immediately an important observation regarding social technologies: they are incredibly hard to predict.
In organizations that provided open-access to e-mail (i.e. without managerial restrictions in place), some thought that electronic discussion would improve the decision making process, as conversations would be “purely intellectual… less affected by people’s social skills and personal idiosyncracies.” The actual results were more complicated. Text-only conversation has less context cues (including appearance and manner) and weakened inhibitions. This has led to more difficult decision making, due to a more democratic style in which strong personalities and hierarchical relationships are eroded. While giving a larger voice to typically quieter individuals, lowered social inhibitions in electronic conversation is also prone to more extreme opinions and anger venting (e.g. “flaming”). One study even shows that people who consider themselves unattractive report higher confidence and liveliness over networked communication.
Given these observations, the authors posit a hypothesis: when cues about social context are weak or absent, people ignore their social situation and cease to worry about how others evaluate them. In one study, people self-reported much more illegal or undesirable behaviors over e-mail than when given the same study on pen and paper. In the same vane, traditional surveys of drinking account for only half of known sales, yet an online survey results matched more accurately the sales data than face-to-face reports. The impersonality of this electronic media ironically seems to engender more personal responses.
Networked communication has also been known to affect the structure of the work place. A study found that a networked work group, compared to a non-networked group, created more subcommittees and had multiple committee roles for group members. These networked committees were also designed in a more complex, overlapping structure. Networked communication also presents new opportunities for the life of information. Questions or problems can be addressed by other experienced employees, often from geographically disparate locations, allowing faster response over greater distance. Furthermore, by creating a protocol for saving and categorizing such exchanges, networked media can remember this information, increasing the life of the information and making it available to others.
As the authors illustrate, networked communication showed much promise at an early age. However, it doesn’t always come as expected or for free. The authors note the issue of incentive… shared communication must be beneficial to all those who would be using it for adoption to be successful. Also it may be the case that managers will end up managing people they have never met… hinting at the common ground problem described by the Olsens [Olsen and Olsen, HCI Journal, 2000]. Coming back to the authors’ hypothesis also raises one exciting fundamental question. As networked communication becomes richer, social context will begin to re-appear, modifying the social impact of the technologies. As this richer design space emerges, how can we utilize it to achieve desired social phenomena in a realm that is so prone to unpredictability?
paper: distance matters
This paper examines and refutes the myth that remote cooperative technology will remove distance as a major factor effecting collaboration. While technologies such as videoconferencing and networking allow us to more effectively communicate and collaborate across great distances, the author's argue that distance will remain an important factor for the forseeable future, regardless of how sophisticated the technology becomes.
This paper reviews results of studies concerning both collocated and distant collaborative work, and extracts four concepts through which to understand collaborative processes and the adoption of remote technologies: common ground, coupling, collaboration readiness, and technology readiness. The case is then made that because of these issues and their interactions, distance will continue to have a strong effect on collaborative work processes.
paper: groupware and social dynamics
Kicking off a batch of papers on Computer-Supported Cooperative Work (CSCW) is Grudin's list of challenges to the CSCW developer...
Groupware and Social Dynamics: Eight Challenges for Developers
Groupware is introduced as software which lies in the midst of the spectrum between single-user applications and large organizational information systems. Examples include e-mail, instant messaging, group calendaring and scheduling, and electronic meeting rooms. The developers of groupware today come predominantly from a single-user background, and hence many do not realize the social and political factors crucial to developing groupware. Grudin outlines 8 major issues confronting groupware development and gives some proposed solutions.
The disparity between who does the work and who gets the benefit
Critical Mass and Prisoner’s Dilemma Problems
Social, Political, and Motivational Factors
Exception Handling in Workgroups
Designing for Infrequently Used Features
The Underestimated Difficulty of Evaluating Groupware
The Breakdown of Intuitive Decision-Making
Managing Acceptance: A New Challenge for Product Developers
Take home messages from the paper: groupware should strive to directly benefit all group members, build off of existing successful apps if possible, develop thoughtful adoption strategies, and be rooted in an understanding of the [physical|social|political] environment of use.
August 05, 2003
I just read this paper by my research group's principal scientist, Peter Pirolli, and former PARC employee Wai-Tat Fu. The paper, entitled "SNIF-ACT: A Model of Information Foraging on the World Wide Web", recently won the Best Theoretical Paper Award at the 9th International Conference on User Modeling.
The paper extends the existing ACT-R cognitive modeling infrastructure to computationally simulate users surfing the web, at a fairly fine-grained psychological level. The system models the user's goals, knowledge, memory, and abilities (in the form of production rules) and combines these with the findings of Information Foraging theory to create a successful model of web surfing and decision making. Information Foraging theory applies the metaphor of animals foraging for food to the task of humans seeking information. In previous work, Pirolli and Card have found that the equations governing the cost structures of the two are the same.
The SNIF-ACT model works by extracting the content and links of a web page and then using a technique known as spreading activation to propagate "activity" through an associative memory network of individual words. Activation proceeds from the modeled user goals through the terms in working memory and out to the currently observed web content. Link weightings between word associations are determined by using word occurence and co-occurrence rates extracted from AltaVista. By finding the highest mutual activity between user goals and available links, the system can compute an estimate of the information scent (much like the scent tracked down by animals in the wild), and use this to construct a probability distribution of the likelihood of following different links. Drop-offs in scent measures are also used to predict when a user will leave the current web site to look elsewhere for a richer information patch (analogous to an animal moving on to greener pastures or hunting an easier prey). The SNIF-ACT model is psychologically richer than previous foraging-influenced systems like Bloodhound, which primarily uses techniques from the information retrieval (IR) field and earlier flawed cognitive approaches.
For more details about information foraging theory and it's applications, check out this essay by pixelcharmer (it even cites my first research paper!), this copy of a talk by Pete Pirolli, and my research group's publication archives.
There are at least two interesting avenues for this work to follow. One is in applications, as successful user models can create better automated usability metrics and could learn from individual behavior to create personalized research and surfing tools. The second is to simultaneously move from the web to other domains, building user models for other content-rich domains (e.g., information visualization). Down the road, I think the integration of content-based and perception-based (e.g. computer vision and audition) analyses will be the next big research leap - creating richer, more realistic, models of user behavior and furthering artificial intelligence research.
July 29, 2003
paper: designing for usability
My prelims are coming sooner that I'd like to admit, and so I need to get hopping reading a bunch of papers. Fortunately, over the past semester our reading group read over half the assigned papers, but for those that are left I will be posting my summaries here for my own archival purposes (including some back-posts for previously read papers). Perhaps they will be of use to someone else as well, so I might as well make these public...
First up is "Designing for Usability: Key Principles and What Designers Think" by John D. Gould and Clayton Lewis. This paper was originally published in 1985 in the Communications of the ACM, and outlines the iterative design philosophy that is central to modern Human-Computer Interaction. The paper describes three central design principles (early focus on users, empirical measurement, and iterative design) and includes a survey of designers trying to ascertain how common and/or obvious these principles are. The paper also rebuts arguments against the use of these principles and presents a case study of these principles in action.
My biggest problem with this article is that the authors are too unsympathetic to the demands that a deadline-driven project can make. They seem to advocate iterating "as long as it takes", which while desirable is not particularly feasible. To be fair, they acknowledge these pressures and give some cogent arguments for why the costs of iteration are not as high as one might otherwise suspect. But what is missing in the methodology are strategies and techniques for optimizing the design as much as possible within bounded resources. Later work has attempted to address some of these issues, including discount usability methods (e.g., heuristic evaluation) and rapid ethnography techniques (e.g., David Millen's paper).
Designing for Usability: Key Principles and What Designers Think
July 24, 2003
talk: jan pedersen
Today Jan Pedersen, former PARC researcher and current Chief Scientist of AltaVista, spoke at the PARC Forum. His talk was entitled Internet Search: Past, Present, and Future. It seems particularly relevant given my recent exposure to personalized search start-up Kaltix. Jan primarily covered the developmental and economic history of search engines and spoke about current search technologies. Read on for my notes from the talk.
Notes: PARC Forum, July 24, 2003
Internet Search: Past, Present, and Future
July 10, 2003
paper: animation support in a UI toolkit
Here's a back-post for a prelims paper: "Animation Support in a User Interface Toolkit", by Hudson and Stasko. I thought this paper particularly relevant, as I'm currently working in interactive graph visualization, which includes a heavy animation component. This paper got me considering higher level primitives I might use in the graph viz toolkit we are developing.
Animation Support in a User Interface Toolkit: Flexible, Robust, and Reusable Abstractions
In this paper, the authors present extensions to the Artkit user interface toolkit to support animation. The toolkit offers basic support for simple motion blur, "squash and stretch", use of arcing trajectories, controlled timing of actions, anticipation + follow-through, and slow-in / slow-out transitions. It also supports a robust scheduling system that helps deal with unpredictable performance from the windowing system... very important since this was running on X-Windows.
The main abstraction used is the transition, which consists of a pointer to the UI component that is moving, the trajectory the component will take, and the time interval over which to animate. The UI component can be any interactor object implemented in Artkit. The trajectory consists of the curve traveled (parameterized from 0 to 1) and a pacing function to determine velocities over the curve (e.g. using a line with slope 1 for uniform animation and an arctan or sigmoidal function to create slow-in / slow-out transitions). The times in the time interval can be expressed as absolute times, as a delay from the present time, or parameterized by the starting or ending of other transitions.
Robust animation and event-relative transitions are achieved using an animation dispatch agent. All that is assumed is that the tookit can ask what the current time is and that the window system will pass back control to the toolkit periodically. The agent constructs a scheduling queue of transitions, and attempts to estimate when the next draw cycle will appear on the screen using a measure of past updates. Using this redraw end time, the set of active transitions for the current cycle is selected. For each active transition, it is started or stopped as appropriate and current parameter values are passed through their pacing functions and mapped to screen positions using the trajectory.
This scheme will animate smoothly when the agent is given control at a regular intervals, but it will also properly handle delays, correctly delivering animation steps at larger intervals.
Criticism: The first thing that struck me is that no mention of scale is given. How many objects can I animate at once? What are the bottlenecks? Obviously rendering time is a major factor, but overhead is accrued through scheduling and through mapping each object through it's own pacing and transition items. In most cases I'd expect this to be a constant time overhead, but this isn't really discussed. Also, cool animated effects like squash and stretch was mentioned multiple times but the implementation of it is not discussed.
Today, 10 years later, we have incredibly more powerful processors and graphics cards, enabling much richer animation possibilities. This paper was ahead of it's time and today's popular toolkits - Swing, MFC, etc - are definitely behind the times. While toolkits like Java2D provide much of the rendering and geometric capabilities needed, animation managers like the kind presented here and in Xerox PARC's Information Visualizer are yet to be common. Hopefully as graphics power continues to grow and the drive for more powerful interactive technologies gains momentum (e.g. more widespread use of information visualization) these more powerful tools and abstractions will become commonplace.