Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 36

Summary Architectures of Intelligence How can the human mind occur in the physical universe John Anderson (2007)

Chapter 1: Cognitive Architecture


Newell introduced the term cognitive architecture into cognitive science through an analogy to computer architectures, which Fred Brooks (1962) in turn introduced into computer science through an analogy to the architecture of buildings. Architecture is the art of specifying the structure of the building at a level of abstraction sufficient to assure that the builder will achieve the functions desired by the user. Computer architecture however more focused on the product of the design rather than the designing itself. Cognitive architecture is there referred to as the fixed (or slowely varying) structure that forms the framework for the immediate processes of cognitive performance and learning. Architecture involves relating its structure to a function. The structure in here referred to as the agent itself and its function is to enable cognition. Before, when scientists wanted to understand cognition, had to either focus of the structure, or human behaviour. To understand the mind however we need an abstraction that gets its essence. There is however still a lot of debate of what is the best abstraction. Taken all together: A cognitive architecture is a specification of the structure of the brain at level of abstraction that explains how it achieves the function of the mind. The architectural program that will be discussed mostly pays attention to three things: the brain, mind (functional cognition), and the architectural abstractions that link them. In the rest of this chapter, some shortcut theories (or scientific approaches that ignore a part) will be discussed. The first one is the Classical Information-Processing theory of Psychology that ignores the brain. The problem with this behavioural approach is that the structure is completely left out. One of the more prominent researches was the Sternberg experiment in which some numbers were presented, and then the subject had to search a probe-number. This approach was criticised because the explanations were biologically implausible. Long these criticism were ignored until connectionism arose. The rise of neural imaging further showed the importance of also taking the brain into account. The second shortcut is the eliminative connectionism which ignores the mind. They did realise that the brain structure generates human behaviour, but maybe that is just enough to understand the mind, and you get functionality for free. The goal is to come up with an abstract description of the computational properties of the brain and then use this description to explain various behavioural phenomena. One of the biggest eliminative connectionist

success is the past-tense model of Rummelhart and McClelland (1986). Children first learn and overgeneralise the regular past-tense, for both regular and irregular verbs. After some time the irregular past-tense will be overgeneralised. Finally after some time they get it correct. Until Rummerhart and McClelland there was no clear explanation for this, but they could and even built a working neuronal network model that performed like the empirical data. In such a model the use of the features strengthens the relationship between them. This means, when errors are made, also false connections can be made. Thus, they claim, they have achieved the function of a rule without ever having to consider the rules in their explanation. This view however rests on a sleight of hand, in which they map activation pattern over activation pattern, not considering anything of human speech productions. How input patterns are produced or what happens to the output patterns to yield parts of coherent speech. That, the model does explain the functional aspects of the entire system. The above mentioned criticism is not for connectionism per se, but rather for any model that disregards the overall architecture and its function. The last shortcut, which ignores the architecture, is called rational analysis, also called ecological psychology (Gibson, 1966) for instance. In this, rather than focussing on the architecture as the key abstraction, focus on the adaptation to the environment. The Bayesian statistical methodology is maybe the most prominent application for understanding human cognition. The Bayesian approach claims the following, but first note that this approach does not claim that people make these calculations explicitly. Rather, we just do not have to worry about how people do it. 1. We have a set of prior constraints about the nature of the world we occupy. 2. Given various experiences, one can calculate the conditional probability that various states of the world provided them. 3. Given the input, one can calculate the posterior probabilities from the prior and conditional probabilities. 4. After making this calculation, one engages in Bayesian decision making and takes the action that optimizes our expected utilities. One of the most prominent examples of rational analysis experiments is about e-mail sending and replying. It appears that the chance that one sends you an e-mail decreases logarithmically when time since the last occurrence increases. This is related to the memory retention function (i.e. how well memories are remembered after last usage). Both these functions depend on functionality. Often used memories (or sources of e-mail) have, at that

moment, more utility than lesser used ones. It reflects the demands that the world makes on our memory. Because we have a limited capacity of memory, it is useful to just throw away the useless bits, and keep the useful information. This is also true for practice functions, associative priming functions, and so on. Thus, the argument goes, one does not need a description of how memory works (which is what an architecture gives); rather, one just needs to focus on how memory solves the problems it encounters. This however still does not answer the question of how the human mind can occur in the physical universe. Although for many species such rational analyses suffice, which we has humans share a lot with, there is still a great cognitive gulf between us and them. Therefore we do need to incorporate all three aspects.

ACT-R ACT-R is an architecture in which cognition is based on the interaction of a number of independent modules which represent some human aspects of the mind. There is the visual module, an imaginal module which holds a mental representation, a control state (goal) module, a declarative module holding the declarative memory, and a manual module. Each of these modules are associated with a particular brain area, which ACT-R has elaborate theories about. As a sixth module, there is a central procedural system, that can recognize patterns of information in the buffers, and respond by sending requests to the modules. This module is equivalent to procedural memory. Anderson (2005) describes a detailed model of learning to solve simple linear equations with children, like 7x + 1 = 29 . This happened through multiple stages with an increasing difficulty. Note that the model was not programmed to do the task. Instead, a starting production is used that will request for some actions when the components of the task are presented. This is called and end-to-end-model which can be implemented in a full cognitive architecture. It appeared that the model predicted childrens performance very well. More difficult equations take longer than simpler ones. Besides that, whilst in the beginning they need separate computations for intermediate (buffer) steps, after some time they combine some procedures resulting in quicker time to respond after few days of (see figure 1.7 of Anderson). How this exactly works will be described in more detail later. What is now important to know is that a person runs through the steps that lie in different buffers, which is just a great elaboration of the Sternberg stage model.

The complexity of these steps to go from instruction to task finishing has seriously hindered efforts to develop cognitive architectures. However, although the steps appear to be very complex, in real life these do not have to be that complex. The only feedback researches have on their model is the accuracy it depicts human performance. Such data do not justify all of this details however. Physiological data could solve this problem. The right kind of physiological data to obtain is that which traces out the states of computations, because this provides a one-to-one tracing of implementation level. Using the Blood Level Oxygen Dependent (BOLD) response, you can see the activations of brain areas very well. When activation of a particular area is higher, it needs more oxygen to process that. However, as typical for BOLD responses, activation slowly rises and decreases, having a peak usually 4-5 seconds after actual execution of the subtask. Due to this effect, activations that have a short interval are not distinguishable. In the table below the different modules are depicted with the corresponding brain regions. The rest of the book will go into more detail about the different areas. ACT-R Module Goal module Declarative module (Retrieval) Manual module Visual module Imaginal module Procedural module Brain Region Anterior Cingulate Cortex Prefrontal Cortex Motor Cortex (Gyrus Precentralis) Occipital Cortex Parietal Cortex Caudate Nucleus (Basal Ganglia)

Symbols vs. Connections in Cognitive Architecture There is a great debate in cognitive science between architectures that are symbolic and connectionist. ACT-R has been placed on the symbolic side although that is not entirely correct. Some fraction of the controversy is really about the language used to describe cognition, rather than about scientific claims. It is unclear if these symbols actually have a meaning, i.e. designate other thing, or that they are considered as pointers. In the following some explanations will be given about symbols and/or connections. 1. + symbols, -connections: The classic symbol manipulation position holds that the principles by which the mind operates involve transformations of structural properties of symbolic representations (as a pointer in LISP). 2. - symbols, + connections: This position is called eliminative connectionism, because it wants to eliminate the concept of symbols. Connectionists view symbols as explicitly stated rules and regard them at best, as good approximations or, at worst, as misleading.

3. + symbols, + connections: Implementational connectionism believes that connectionist computations are organized to achieve symbolic results and that both connectionist and symbol characterizations play an important explanatory role. 4. symbols, - connections: In this view both symbols as connections are disregarded, or even explanation of the mind at all.

In ACT-R, the symbolic level is an abstract characterisation of how brain structures encode knowledge. The subsymbolic level is an abstract characterisation of neural computations of making that knowledge available. The symbols provide access to distant physical structures that hold a particular bit of information. Computations are local, but information must be brought from other locations. The symbol can sometimes be seen as the neural tract. Subsymbolic structures in turn hold the general activation, and connection strength of particular pieces of information, and determine what information is brought and how quickly. With respect to the declarative module, ACT-R has networks of knowledge encoded is what we call chunks. One chunk can hold different pieces of information that were experienced together. Retrieval of one piece of information can result in a recall of the entire chunk. These chunks have activations at the subsymbolic level. The most active chunk will be retrieved, and its activation value will determine how quickly this happens. This is related to the retention function. The procedural module consists of production rules. Symbolic references can be moved from one location to a distal location. In the example of linear algebra (say 3 + x = 8), the value of symbol 1 (=3) can be copied to another location to find symbol2 (8) symbol1. The content of the symbols do not matter thus. The same procedure can be used for any number. There are situations in which multiple productions can be useful. In this case, at a subsymbolic level, utility will be determined. This will be more elaborated in chapter 4. As subsymbolic traits are like connectionist networks, it may be clear that ACT-R has some connectionist implementation, rather than a symbolic one. The symbols in the code are only symbols for the simulation program, but are not symbols for the ACT-R architecture or the connectionist network. The difference of abstraction is the only difference between ACTR and the connectionist model. The question is which level will provide the best bridge between brain and mind. ACT-R lies closer to the mind, whilst the connectionist model lies closer to the brain.

Chapter 2: Modular Organization of the Mind


The human mind is what emerges from the actions of a number of largely independent cognitive modules integrated by a central control system. In the following something about the functions and structures will be discussed. The world we live in is very complex in which simultaneous demands are placed on them. We have multiple resources to act on them, but they are also very limited. First of all, there is the perceptual module. We can detect important information through multiple sensory media. Secondly, we must take appropriate actions based on the information we have. Finally, our actions and thoughts must be coordinated to achieve our needs. Success does not only depend on what actions are taken, but also when and in what order. With respect to perceptual and motor abilities, we do not differ much from primates. However, the ability to control and organise combinations of behaviour is much better. Driving a car is maybe the best example, because you need all these modules, simultaneously, all the time. These functional demands would not be as interesting if there were no limitations. These limitations do exist however, with the first being some of parallelism. Two basic forms of parallelism exist, namely the matter of number. We have only limited amount of space and cannot pack infinite numbers of neurons. Secondly, there is the matter of communication. Also due to the lack of space, some information has to travel longer to reach its desired goal. Through evolution, the brain has organized itself in such a way costs are minimal, and benefits optimal. The visual system can depict these limitations the best. Whilst the eyes continuously process visual information, only a small part of it, namely everything on the fovea, can be analysed in greater detail. However, you will duck when you see something flying in the periphery, which indicates the rest of the visual information is also analysed in some degree. This processing occurs in more than 30 different brain areas. These are organized in such a way that object recognition areas lie in the ventral stream, and action affordance/spatial in the dorsal stream. Also, early visual areas are topographically organized. By laying those areas together, different areas can interact together more quickly and more easily. Some say that the brain is equitpotential, i.e. every area can perform the same task. Lesion and MRI studies provided some evidence for this assumptions. This however, is not entirely true. Where some things can be performed (or rather, taken over) by adjacent areas, some tasks can be performed by completely different (and maybe less efficient) strategies (and thus brain areas).

Some parts of the brain are capable of doing their own processing, they do have to coordination some times to achieve a function system. Whatever task you perform, the different sub-actions you take must take you towards the goal. To achieve this coordination, tracks of brain fibers connect multiple cortical regions via subcortical regions. The basal ganglia plays a very prominent role in this control, because it is connected to a lot of subcortical regions. One of them is the pallidum, which projections are inhibitory, which in turn inhibit the thalamus, which in turn project to the frontal cortex to select actions. The pathway that is activated the most will therefore be selected, just like ACT-R does. There are three key features in this characterisation of the basal ganglia. It allows information from disparate regions of the brain to converge in making a decision; It requires a great compression of information from what is happening in these individual regions because the number of receiving neurons is so much smaller; Processing that involves this multisynaptic loop is necessarily much slower than processing that can occur in a single brain region. The existence of these structures that have these properties is almost a necessity given the need for coordination of information and the limitations on the human nervous system.

Modular Architecture The overall structure of ACT-R exists of eight modules. Two perceptual modules (visual and auditory), two response modules (manual and vocal), and the four other modules. Each of these modules is capable of massively parallel computation to achieve its objectives. However, there are some serial bottlenecks when multiple modules interact. Besides, every modules buffer, can only hold one chunk at a time. Communication among these modules occurs via a procedural module. This module can respond to information in the buffers of other modules and put information into these buffers. One significant constraint in ACT-R is that only one production can execute at the same time, which costs about 50 ms. Since communication among modules must progress through the procedural module, it becomes the overall central bottleneck in information processing. The procedural module, although constantly referred to as a module, is not in fact an actual module. It has no buffer, but, then again, it is also not a cortical structure. It is interesting to consider what about the architecture is uniquely human. One aspect might be the goal module/cognitive control, which enables us to perform means-end problem solving. The key to means-end problem solving is the ability to disengage from what one wants to focus on something else (the means). The ACT-R architecture can be viewed as a summary of an emerging consensus in the field, though significant controversies still exist.

Fodor proposed that a fragment of human cognition was achieves by what he called modules. He though a modular structure best characterised certain input systems. He listed 9 properties he associated with input modules, six of which are reviewed below: 1. Domain Specifity: Every module (especially visual and auditory) can only process a restricted set of stimuli. 2. Mandatory operation: When certain input arrived, the modules have to act, and how they act cannot be modified. This is not dependent of the systems beliefs. 3. Information encapsulation: The information that the modules process is internal and do not need to make requests of other systems for information. This is true in ACT-R, however the modules are able to trade information with other modules. 4. Fast operation: As a consequence of information encapsulation, modular processes are the fastest cognitive processes. It does take some time for the modules to do their thing however, and the overall speed of cognition will be determined by these modules times. 5. Shallow outputs: Outputs of the modules are considered shallow. Although not all modules in ACT-R report simple perceptual results, the restriction of buffer contents to single chunks makes the output of the modules very limited. 6. Fixed neural architectures: Corresponding neural structures are associated with these modules, and do not change. Although ACT-R agrees that these modules are not learnt, which is a strong nativists claim, the contents of the modules are certainly influenced by experience. Although there is already considerable evidence for neural plasticity, this is still not implemented in ACT-R. Each of Fodors claims could be questioned,. However, as they are stated above, they are not the point of controversy, and come close to summarising emerging consensus. Some significant controversy does exist however, due to the further claims people made about the modules. We discuss three of them. 1. language. Fodor proposed that some separate module exists for it. However, the remarks about information encapsulation and the innate basis of syntax, has generated considerable debate. ACT-R remains agnostic about the issue, though admits it is still incomplete. 2. Content-specialised modules: Others than Fodor have proposed modules with rather specialised content, sometimes considered as the Swiss-Army knife model of cognition. ACT-R again remains agnostic.

3. Central cognition: Fodor rejects the existence of a central processor, and restricts its modules to input (and perhaps output) systems. It is here that ACT-R and Fodor part ways, because a considerable amount of evidence has already been found. Altough Fodor is right about there being no brain centre for modus ponens, the basal ganglia do appear to implement something like a production system. Fodors reason for doubting that cogntition can be modelled computationally comes from his concern with the frame problem. This was started in AI as a technical concern with how to update knowledge in logical systems. Fodor uses the concept of analogies being beyond a computational system. ACT-R however did achieve to model this. Fodors worries seem not to have been realised in a documented instance of human cognition. The basic motivation for a modular structure is to get the best performance possible given the limitations of brain processing. The assumption is typically that parallel processes are not capacity limited, making them quicker, whilst serial processes are capacity limited. In ACT-R the same module can only hold one chunk and thus work in one task at a time. The hardest limitation however is communication between buffers. Because only little information can be saved in a buffer, it is difficult to transfer a lot between them. Furthermore, the modules communicate through the production system. Since the production system can only execute a single rule at a time, it becomes a central bottleneck in overall processing. This idea of a central bottleneck however does have some challenges, one of them presented by EPIC. EPIC is an architectures, that was influenced by ACT, and in turn has strongly influenced ACT-R. The perceptual and motor module are taken directly from EPIC. The limitations of the perceptuo-motor components have substantial impact on many higher level cognition processes. One limitation ACT-R took was the serial bottleneck within every peripheral module. Similarly, the non-peripheral modules can only communicate through small buffers. The major point of disagreement with EPIC is whether the central production system also has a central bottleneck limitation. In EPIC this serialism does not exist and multiple productions can be fired at once. EPIC models do come with special control rules, but these are task specific, making it necessary to implement rules for every task. Besides that the rules cannot be learnt by the model itself. Production learning in ACT-R involves learning new productions from old; this is easier to do if the learning mechanisms do not have to deal with simultaneous production firing. While such functional issues are critical, most of the attention of the field as been on empirical evidence for a central bottleneck. This involves dual-task experiments where participants are asked to carry out two tasks in parallel.

Let us first discuss driving a car. It requires the driver to do multiple things at once. The most critical task of all is controlling the vehicle, i.e. steering and accelerating. Besides watching the immediate driver in front one must also consider the other cars that may surround the car. Salvucci developed an ACT-R model that incorporates these tasks. Control: The basic ideas for vehicle control can be found in many mathematical models of driving. The input of control involves keeping track of a near and a far point, and holds an output that adjusts the longitudinal and lateral displacements. At minimum, this loop consists of three productions, costing 50 ms each. However, the model also needs to monitor the environment. Monitoring: The monitoring model selects which lane to encode and whether to encode information in front or behind. Whenever it identifies a new vehicle, it notes its lane and position, which is held internally and used to help guide such decision as whether to change lanes. One of Salvuccis contribution is the proposal of a scheme for interleaving the two subtasks. After each iteration of the control cycle, the model determines how stable the driving situation is. If not, the control cycle repeats. Otherwise the control cycle times out for 500 ms and monitoring takes over. After 500 ms serious control problems start to occur namely. ((((4000)))) Salvucci is able to use distribution of eye movements to track this shift between the tasks, following his eye movement theory EMMA. This assumes that the eyes follow attentional shifts in order to achieve higher resolution. Although eye movement is not completely following attention, results are quite accurate. More interesting is his analysis concerning the switch between control regions. In a probability analysis, the switching from monitoring to control shows a peak after 500 ms, while the probability of switch from control is concentrated at short intervals. The length of time spent on controlling depends on road conditions and the stability of the vehicle. Interestingly, during driving we also interleave both task for other activities like talking, changing the radio station etc. Switches between control and the secondary task remain the same as monitoring. After 500 ms you need to return to control to stay in lane. It appears that dialling a number takes longer whilst you are driving, and that lateral deviation increases slightly. This can be ascribed to the fact that some processes of dialling can occur simultaneously with controlling the car. In conclusion, the modular structure of ACT-R suffices when you want to model a complex task in which you often switch attention. The next examples however will focus on more detailed analyses of dual-task performance.

Dual Tasking: Modular Parallelism and Seriality. The temporal organisation is better understandable in laboratory task. There are two kinds of parallelism and two types of seriality associated with all of the ACT-R models. There is within-module parallelism: e.g. large visual field, multiple memory search for retrieval, or multiple muscle activation. There is also within-model seriality: The need for communication and coordination poses serial bottlenecks within each module. Next is between-module parallelism: Computations in one module proceed in parallel with computations in another. Finally, there is between-module seriality: In many cases, one module must wait on another because it depends on the information from that module. The ACT-R and EPIC conceptions in this situation are identical except for the central bottleneck. Because only one production can fire at a time (within-module seriality), communication among other modules can be held up (between-module seriality). ACT-Rs position is more uniform in that it claims every module has a bottleneck. EPIC, on the other hand, claims no central bottleneck. Much of the evidence for a central bottleneck involves studies of what is called the psychological refractory period, where one is asked to do two tasks. The second task is presented shortly after the presentation of the first task, and reaction time to perform both is measured. The time to finish the second task was longer compared to performing the second task separate from the first task. This is evidence for a central bottleneck. However, with enough practice it is possible to perform both task simultaneously, i.e. showed near-perfect timesharing. This was seen as evidence for the EPIC theory. Anderson, Taatgen & Byrne (2005) did an extensive research to Hazeltines results, and build an ACT-R model that predicted actual human outcomes very. The fact that people can show near-perfect timesharing, is therefore not automatically evidence for the none-existence of a central bottleneck. Figure 2.8 illustrates the behaviour that Taatgens model showed. You can see that in the late trials procedures are combined and different module can be active simultaneously, resulting in very few bottlenecks. The learning process of combining production rules will be described in chapter 4. Compare the four mixtures of parallelism and seriality with the model, and check what is going on. Even when one of the tasks is made either more complex or easier, near-perfect timesharing is still observed. With highly practiced participants, Hazeltine et al. (2002) never found dual costs greater than about 10 ms. One would expect however, that there would be a greater degree of overlap. With variability in timing and the slack time though, it is not hard to imagine how one can get a stubborn small delay that does not seem to change much.

Delays can get much more substantial in situations such as that for the beginning of the experiment, when more central processing is going on. Tests of a central bottleneck are much more telling when the amount of central processing becomes substantial. However, it would be a mistake to leave this discussion on the residual differences between the ACT-R and EPIC conceptions. In fact, the views are identical on most scores, and they take the same position in many other conceptions in the field. Given that even in the EPIC models they enforce seriality through the rules, indicates there is an increasing consensus.

Mapping Modules Onto the Brain As discussed above, modular organisation is the solution to structural and functional constraints. The mind needs to execute certain functions, and the brain must devote local areas to achieve those function in an efficient way. There have been distinguished eight different modules, and functional brain studies mapped those modules to certain brain areas. 1. Of the visual module the fusiform face area that seems to reflect focused visual processing of attended information the best. 2. The aural module has been associated with the secondary auditory cortex, but not the primary. As with the visual module, it maps to relative advance processing of info. 3. The manual module is reflected in the brain area around the central sulcus, including both the sensory and the motor cortex. 4. The vocal module is also found on the sensory and motor cortex, but more towards the middle of the brain. The four central regions are widely distributed throughout the brain.\ 5. We have associated the imaginal module with a posterior region of the parietal cortex (i.e. near the visual). This association is roughly consistent with the research of others who have found that this area is involved in spatial processing. It seems to reflect the effort made in transforming a mental representation. 6. Both retrieval from and storage in the declarative memory have been found in a region of the prefrontal cortex. This is consistent with a lot of memory research, though the exact memory function still needs a lot of continuing study. 7. We have associated the goal module with the anterior cingulate cortex. There is consensus that this region plays a major role in control (and conflicts/contradictions), but how is still the question. 8. The procedural module lies in the basal ganglia, and plays a role in pattern recognition and selection of cognitive actions. The specific region is the head of the caudate.

As citation above indicate these mappings are not novel. What is novel is the association of these regions with parts of an integrated architecture and their use to trace out the component of that architecture. Note however that the brain tends to distribute similar but distinguishable processes to different regions. Secondly, there is no necessary reason why brain regions should perform only a single architectural function. In the next experiment all eightfive modules will be covered at once. It manipulates the input modality, whether or not it needs some manipulation, and the output module. Because retrieval and (imaginal) representations are naturally correlated an artificial task is needed, Table 2.1 shows how the experiment attempted to manipulate retrieval and representational demands orthogonally. This resulted in different responses, but what is more interesting, also different BOLD responses. In general, the results are according to the predictions: input modality has effect on perceptual regions, output on motor regions, transformation on parietal and substitution on the prefrontal region. Furthermore, the latter two (cognitive) factors have effect on the ACC. The caudate (procedural) nucleus however did not show any significant effects. In summary, the general pattern of results is largely consistent with the proposed associations, except for the caudate. This may be related in part to the relatively weak magnitude of response in this region. ACT-R can, besides predict what areas are active, also predict the BOLD response over time. Results of the experiment showed large effects for tor the cognitive factors on the task; this helped in separating out representational and retrieval effects. Interestingly, neither input or output had effect on time to finish the task, despite the large effect they had on perceptual and motor modules. The model also predicted the BOLD responses very well. See figure 2.12 for the actual responses of the brain areas in relation to the task. The general pattern of effects in the experiment and the ability to explain the exact shape of the BOLD responses, provide strong support for an association of these regions with these modules.

Overall Conclusions Read the book. Page 86 and 87.

Chapter 3: Human Associative Memory


Declarative memory is a high-capacity module by which we are able to perceive our past. However, it seems that there are more things missing than we can remember. It might be still there but we have difficulties remembering those. Memory is very important. People even state that memory is critical to our sense of self-identity. HM is a typical example of a person who suffers retrograde amnesia since his 16th. He functions normally in the world, except that the person he is, has not changed from his 16 year old him.

Varieties of Learning Although HM cannot learn new conscious memories, he can learn certain procedures and even faces/names, although he would claim not to recognize those. Four different types of memory can be distinguished. Table 3.1 shows the taxonomy of learning, i.e. what type of memory can be learned in what way. 1. Fact learning is the only kind of learning that results in conscious memories. You form new memories in declarative memory. It can be distinguished between episodic memory (i.e. what we experienced), and semantic memory (i.e. what we learnt). Some things may not reflect any declarative memory we have formed but we can recognize and reason about them through perceptual, categorical, and inferential abilities. (Symbolic) 2. Strengthening is making your memories more available for exposure. Even when you are probed unconsciously this can strengthen the memory chunk. (Subsymbolic) 3. Skill acquisition falls under the category procedural memory, in which new procedures are learnt. There is for examples typing. Whilst you do not consciously where all the keys are, you can still type very quickly. (Symbolic) 4. Conditioning is the passively learning of a response through experience. This type of learning is so widespread through all the animals because it needs not any capacity of declarative or procedural memory. Conditioning can take much more refined forms in humans thanks to their acquired skills. (subsymbolic). The rest of this chapter will focus on the declarative memory.

The Structure and Function of Declarative Memory Like all aspect of cognitive architecture, declarative memory arises as the result of trying to achieve certain functions, given the constraints it has. The constraint considering declarative

memory, is that memories have to pass through a particular set of structures located in the medial temporal cortex. The hippocampus is a subcortical structure located in this cortex. It is bidirectionally coupled (receives input and gives output) with almost the entire cortex, making it ideally situated to save the current state of the cortices and/or environment. Despite the importance of the hippocampus though, other parts also contribute to certain memory aspects. The general temporal-hypocampal region is a critical bottleneck forming permanent declarative memories. If this is missing, immediate facts are instantly forgotten when attention turns away. Due to the small size and metabolic costs of the hippocampus, only little information can be stored in declarative memory. Besides that it is due to the flexibility of declarative memory itself. Because the conditions of retrieving a fact are not prespecified, and the amount of potential relevant memories nearly unbound, the costs of sorting these can be considerable. A lot of irrelevant information will therefore be thrown away. Additional memories can interfere with one another. Whilst human can learn to distinguish interfering memories pretty quickly, early connectionist models had great difficulty and needed a lot of time to learn it. Many researches tried to fix and resulted in two learning systems: One was a typical connectionist model that learned slowly and had high generalisation and interference (associated with the cortex); The other model was quick but showed little generalisation and interference (associated with the hippocampus). That HM could remember things earlier in his life despite the loss of the hippocampus is evidence that there are other areas involved. It appears that through rehearsal and repetition, memories can be transferred from the hippocampus to neocortical regions very slowly. However, it is also possible to store those memories without accessing the hippocampus. It does stays the main area to save and retrieve memories though. While there are serious limits on the hippocampus, it does not limit all learning, because much of what we learn is not stored in the hippocampus. Rather, it is stored as general patterns in the neocortex. Besides that we also have procedural memory but we will return to that later. Some people with hippocampal lesion had difficulty remembering day-today events, but were perfectly able to attend school at a normal level. Eichenbaum (1997) suggested that this reflected the difference between the hippocampus, in which episodic memory is stored, and the parahippocampus, which is used for semantic memory. Take this distinction with caution, because people with a hippocampal lesion can remember some episodic events.

In principle, the information we store in declarative memory could be stored procedurally. Rather than having to go through the searching declarative memory, representing the result, and extracting the answer, a production rule just produces the answer. The consequence of this would be though, that for every bit of information you would need a production. Besides that, in the way that it works at the moment, you can have several probes to remember just one event. If everything was proceduralised, you would need a different rule for every probe, which is not doable. Even if you have the capacity, you cannot know the probes in advance. In essence, flexibility of acces is what makes a memory declarative. Declarative memory, faced with limited capacity, is in effect constantly discarding memories that have outlived their usefulness. The research that focusses on this in ACT-R is very much based on the subsymbolic level. The odds you would need a particular memory decreases over time, following a power function ( ), in which A is a constant,

t the amount of time since last retrieval and d is the decay rate. However, every time you remember something it will increase the odds that the item would appear again. The appropriate formula becomes then in which n is the number of occurrences. It

thus seems that memory makes information available in proportion to how likely it is to be useful. The above function however does not take into account any contextual effects. When one events is occurring at the moment, the probability of some similar event to occur will be more likely, compared to the individual probability (e.g. the words AIDS ; virus). Schooler (1993) found that the effects of a high associate is just to raise the intercept of the function (logarithmically). This was also true for human memory.

Declarative memory in a Cognitive Architecture The human memory system operates in a fully functioning cognitive architecture that actually does tasks. Almost every task involves declarative memory as a mean, in complex interaction with other modules, to achieve the goal. A serious constraint is that the declarative memory must function correctly in these task as well as in pure memory tasks. The declarative module essentially allows the system to perceive its past, which is basically the chunks that existed in the buffers of the other modules. The chunks in declarative memory have activation values that determine the speed and success of their retrieval. The activations holds its inherent strength (base-level activation) and the associative strength. This results in the following function , in

which C is the context, i is the element in the chunk, B is the base-rate activation, W is the

attentional weighting and is based on individual differences of how easily retrievals are made. S is the strength of association, which is by default based on the size of a fan (i.e. amount of associated chunks). Given the activation of a particular chunk, through Bayesian calculations, the probability of retrieval can be calculated. Eventually it depends on time since last retrieval, current activation, and random noise. See table 3.2 for all equations. So, as activation increases, probability of retrieval also increases, and latency decreases. There exists however a threshold ( ) which is the minimum activation for something to be retrieved. Because there exist noise during perception and retrieval, sometimes activation falls below, and sometimes above the threshold. When multiple chunks compete for retrieval, we basically imagine a parallel process by which they are compared and selected. One experiment looked at the associative strength. As mentioned above when, two chunks are associated with each other, they form a fan. A fan is the number of associated chunks, including oneself. The associative strength (SCi) decreases when a fan becomes greater, because its appearance becomes a poorer predictor of any particular fact. Therefore, retrieval time also decreases (which can be seen as a measure of activation) when a fan is bigger. This is shown in a fact-learning study in which sentences were to be remembered. The above mentioned practice and fan effects do extend to memories outside the laboratory. When comparing real-life fact, with novel experimental facts, reaction time was much lower for the former. The number of facts learned in the experiment affects both new and prior facts, reflecting the fan effect. With fMRI analyses, and BOLD responses, it appeared that the prefrontal cortex is divided into multiple parts. The left prefrontal cortex is more active in word recognition, whilst the right prefrontal cortex is more active in picture recognition. Also, remembered items gave greater BOLD responses than items that were forgotten. Wagner and Brewer stated that the participants remembered more, due to the larger activations and therefore a stronger response. However, when activation is greater, retrieval time is shorter, and should result in a lesser BOLD response. A fan-experiment was used to investigate these observations. A higher fan should result in a stronger fMRI response in the prefrontal cortex, and these results were also found. The size of the effect was even related to the relative latency values. Previous research has focussed primarily on memory. People however, do not always remember what is done, but rather what is done. This is also true in ACT-R models, which do not overtly use declarative memory all the time. Nonetheless it is still used because

declarative memory carries the general knowledge to do the task. We for instance use instance-based decision making, in which we think of a previous experience and then decide whether or not to act the same. Often we rely more on those instances than basic learnt principles. The most common type of ACT-R models involves some sort of instance-based retrieval. The behavriour of these models is strongly coloured by the activation process that determine what will get retrieved. The success of these models is maybe the strongest evidence for its success. Check the eight instances of instance-based models.

The Role of Memory in Heuristic Judgments All the examples of instance-based models show that people rely on retrievals in order to achieve the goal correctly (and quickly). However, according to Tversky & Kahneman, judgments based on memory instances (heuristics) are often erroneous. One examples is the availability heuristic (words that start with k vs. k on the third position). Goldstein & Gigerenzer(1999) however argued that availability improves quality of judgments. (recognition US vs. German cities). Schooler and Hertwig (2005) analysed this in an ACT-R model and showed the relationship between the activation levels and the probability that students would recognize the cities. This model was, again, very accurate. What is interesting though is what would happen if the decay rate changed. It performs best when this parameter is set to .34 units. When it becomes closer to one, the rate of forgetting is very high. On the other hand, when this parameter is set to zero, almost nothing will be forgotten and probability estimates drop to chance. That is because all the cities one will see will be remembered and the use of the recognition heuristics loses its effectiveness. Thus, the forgetting process is serving a critical role in making the recognition heuristics as accurate as it can be. However, when this heuristics is not useful, people can use other tactics to make proper judgments.

The Role of Declarative Memory in the Cognitive System: A Reprise Read the book. Page 130 131.

Chapter 4: The Adaptive Control of Thought


A flexible declarative memory is critical to our ability to adapt to a changing world. However, you need to do something with that information, which requires complex, deliberative processes in which we make inferences and predications on the basis of the knowledge. When quick actions have to be taken, this complexity is fatally flawed however. It is therefore necessary to develop effective cognitive procedures by which frequently used productions are identified so that they can be directly evoked by the situation. These cognitive reflexes need to be integrated with the more deliberative processes to yield adaptive control of thought. For this, the procedural module comes to place.

Relationship Between Though and Action Behaviourists always though that there was no difference between knowing and doing. Knowledge was realized as Stimulus-Response habit strengths, which were connections between S, and R. There was however room for drive and reinforcement in this theory. Some variations were the SRS and SS expectancies. Tolman proposed that these expectancies are put together to achieve various needs by inference (i.e. the first cognitive view on this matter). He also stated that repetition was only necessary to learn things/relations. It appears that reinforcement was necessary to strengthen the bonds though. This involves latent learning. In an experiment rats had to learn a maze. When no reinforcement was given (law of exercise) lots of errors were made before solving. When regular reinforcement was given (law of effect), performance increase gradually. The group of rats that got reinforcement after 11 days however, started of slow but increased rapidly and even outperformed the regular reinforced group. Tolman saw this as prove that strengthening was not necessary for learning, but was for performance. There was however still no consensus. The solution came from artificial intelligence, which was then rather mechanistic. Newel and Simon built a problem-solving operator like STRIPS (Stanford Research Insititute Problem Solver) and used that to solve the maze, which was quite capable to do so. Shortly after that, reactive architectures appeared that did not use symbolic representations of knowledge (as STRIPS did). R.A. Brooks had a model that could layer complex behaviours over simple one, but was not capable of reaching human intellect. The problem was that it could not adapt to new information about the world. In current AI deliberative and reactive components are combined. To the degree one can anticipate how knowledge will be used, it makes sense to prepackage the application of that knowledge in procedures that can be

executed without planning. To the extent that this cannot be anticipated, one must have the knowledge in a more flexible form that enables planning. Parallel to these discussions in behaviourism, cognitive scientists studied how the struggle between though and action plays out in the individual. The Stroop tasks showed that people need some careful cognitive control to fight their reflexes of reading the word. This conflict basically involves the battle between Hulls S-R bonds and Tolmans goal-directed processing. While these is this struggle, it is notable that the goal usually wins. Someone who cannot do the task with few errors may be suffering from damage to the frontal cortex. However, such conflicts are not immutable. In tasks where the reactive behaviour is less engrained, it has been shown possible to reverse the direction of interference. ACT-R could model this. Another task was designed to depict interference levels, but with numbers and lengthy arrays. On practice view one might expect that children would show less Stroop interferences, but they show greater. This can be attributed to the fact that they cannot control their behaviour as well as adults can. The children used different tactics like self-speech to improve performance. In conclusion, it appears that both Hull and Tolman were correct.

Relevant Brain Structures Three brain systems are particularly relevant to achieving the balance between reflection and reaction. First, the basal ganglia are responsible for the acquisition and application of procedures (Hulls reactions). Second, the hippocampal and prefrontal regions are respons ible for the storage and retrieval of declarative knowledge (Tomans expectancies). Third, the ACC exercises cognitive control in the selection of context-appropriate behaviour. The basal ganglia is considered to perform the function of learning S-R associations and more advanced cognitive procedures. In a sequential pointing task, monkeys had no problems when ordering was reversed, while they did later on. This indicates that the task had been proceduralised. Also, inactivating the basal ganglia resulted in loss of effect of the highly practiced sequences. The BG seems to display a variant of reinforcement learning (law of effect). It contains spiny neurons that are involved in a process that learns to recognize favourable patterns in the cerebral cortex. Response to unpredicted reward and moving back in time, are exactly the properties of the so-called temportal difference learning that has become such a popular reinforcement learning algorithm in AI. Most researches were focussed on non-human, but the same regions are identified in human.

The hippocampus is the structure where the Tolmans cognitive maps are stored. The cells fire when one is in a known environment, and increases in size when the demands are higher. In human however the hippocampus stores more than just spatial knowledge. Besides that it appears to work according to Thorndikes law of exercise, in contrast with the reinforcement learning in the basal ganglia. One line of research that depicts the difference involves maze learning, but only a simple plus shaped maze. Rats start at one position, and must find the reward in R1 (on the right). They built a representation of the location in the hippocampus, and a right-turning response tendency in the basal ganglia, as impairment studies showed. The ACC plays a critical role in controlling behaviour. During evolution a new class of spindle-shaped cells have evolved with much stronger concentration in human than any other animal. Great activation and size of the ACC correlates strongly with cognitive control. Another role ascribed to the ACC is the error detection centre, which is supported by the error-related negativity in event-related potentials (ERP). Eventually, it appears to be a conflict monitoring centre, and errors are just part of it. Other regions in the prefrontal cortex then respond to the conflict once detected. ACT-Rs goal module interpretation of the ACC is relative close to this proposel of Posner and Dehaene (1994). The goal module is maintaining the abstract control state that allow cognition to progress in a correct path independent of the external situation. Sohn et al. (2004) proved this, and the ACT-R model, using an and/or vs. nand/nor task, in which the participant was either warned or not about what operator they got. A warning also increased BOLD responses in the ACC. This study illustrates that the ACC responds to manipulations of task complexity that correspond to the need to make extra distinctions in the control of the task. At an abstract level, there is an emerging consensus, while in the details there are still some controversies.

Architecture To fully understand the procedural system we must also look at the declarative module because they go hand in hand in most tasks. The multicolumn subtraction model is interesting to consider because it is one of the early rule-based approaches to solving problems. Many of the production rules in table 4.4 can be recognized by human themselves. This example illustrates all of the modules that are critical to the execution of cognitive procedures. Besides including mental stimuli an response, these productions illustrate a number of computational

features that had been considered problematic in the history of S-R theories. Production rules only respond to patterns of elements, both within a module as all the modules taken together. The basic idea of these production rules is that the system comes to some new situations for which it does not already have rules. This requires retrieval of declarative information and deliberations about what has been retrieved. Then (1) prior instances can be retrieved, (2) you reason from principles, or (3) explicit instructions can be retrieved and followed. Early on these instruction can be stored in declarative memory instead of as a production immediately. Eventually these instructions can be collapsed into a production rule. This collapsing is called: learning by contiguity (close relationship). Production compilation collapses two productions that follow each other into a single procedure. Some combinations of buffers can produce difficulties. Most interestingly when a retrieval in the former, results in an action in the latter. Each time a new production of this kind is created, another little piece of deliberation is dropped out in the interest of efficient execution. Guthrie argued that such new rules would be learned in a single trial. However, it happens much more slowely as procedurally memory is acquired gradually. In ACT-R models, these rules are gradually strengthened until they start to be used. This happens through the increasing utility of such productions. When multiple rules can be applied in a specific situation, the organism chooses the rule with the highest utility. Due to some noise, it happens that a rule with a lesser utility will be chosen though. This has advantages because, otherwise it would not be possible to acquire new productions. The probability that production i will be used, can be calculated with the following equation: , in which Ui is the

utility of the rule, and Uj the utility for all applicable productions j. s again represents noise. The utilities of production are set according to the reward they receive, and are updated according to a simple integrator model. The function is in which is the learning rate and R the reward. What is interesting is that rewards occur at various times not exactly associated with any production rule. In ACT-R, all the productions that fired going back to the last significant event are rewarded, but with a decreasing rate. This is like temporal discount. Besides, when a new production rule is first created, it has an initial utility of 0 and therefore very unlikely to be fired. However, when it is recreated its utility will increase as well according to the difference learning equation. To understand how this exactly works consider the multicolumn subtraction, and the four rules that might be possible to use. One of the rules is wrong, but will produce correct answers some times, resulting in a utility and a relatively often use. When you find out it also

produces wrong answers, you will turn to the other rules, which will gradually increase utility. Therefore, instructional and the erroneous rules will be used less often. See figure 4.10 for a graphical representation of these events. There are some features of this treatment of utility that are worth noting: 1. Gradual introduction of a rule 2. Ordering of rules by utility 3. Sensitivity to solution time 4. Sensitivity to change: The system is not locked into one way of evaluation. If payoffs or environment changes, it can adjust.

Evidence We now review three lines of evidence as support for procedural memory. Firstly we will discuss research that supports the utility computations. The research was interested in probability learning. This task, people had to estimate whether the left or right light would turn on. With regard to the right, this had a changing probability that increased with 10% every 4 blocks, with a 4-blocks interval in which the probability was 50%. The subjects performed very well in accordance with the true probability of the lights, and the model fitted the results also very well. Friedman et al. (1964) also looked at sequential choices within a block. The results show that people did not show the gamblers fallacy and guessed opposite after a long run. Besides, participants were most influenced by the most recent event. This recency effect is part of the nature of the utility updating. Production compilation collapses pairs of rules into one and frequently eliminates the need to retrieve from declarative memory, achieving greater efficiency by eliminating unnecessary reflection. However, solely the speed increase is no evidence for this. The greater efficiency should mean less metabolic cost and hence lower BOLD responses on fMRI scans. Besides that, certain regions reflecting certain components should decrease in activation, which can indicate what processes are really skipped. Unlike many people believe, there is no shift in activation though. In the fMRI studies, the results show areas that are significantly different in activation. Note however, that, when significance decreases (i.e. apparent collapsing had occurred), this might also be due to a higher noise level. Moreover, one needs to look at regions that are selected because they have known functional significance, not an arbitrary threshold of significance.

Production compilation predicts which regions should show reduced activations and which should not. Because there are fewer retrievals and production rule firing, the prefrontal and caudate should show a lower BOLD response. On the other hand, there should not be reduced activation in the ACC and perceptual and motor cortices. In four experiments the actual effects in all regions have been analysed. The ACC did show some variation but no consistent pattern, except that it increases with task complexity. The posterior parietal did not have strong predictions. Overall, there was a marginal significant decrease. The prefrontal cortex showed a significant decrease in all experiments, indicating that indeed, less retrievals are made. The caudate showed two significant decreases, and two differences in the right direction at least. Also here we can safely assume that the number of production rules actually decreases. In the motor region nothing special was found, as predicted. All previous examples were in laboratory settings having a maximum of 10 hours learning. In real life, learning can take thousands of hours or even more. Language acquisition could be the most pronounced taking tens of thousands of hours. One common method to research tis is to look at one small component of the overall competence, in this case learning the past-tense inflection in English. While there is clear set of inflectional rules for the English past-tense, there are great many exceptions. The acquisition has been characterised as displaying U-shaped learning, in which performance decreases before it increases. This is due to overgeneralisation of the exceptions. The appearance of the overgeneralisation had originally been taken as evidence that children were acquiring abstract rules. However, even a simple connectionist model could predict this without the use of an extra set of rules. These were criticised for several reasons, and even more recent models did not take that away completely. One class of issues relates to linguistic details; another class of issues concern whether solutions to the learning problem have in some way engineered into the way the problem is presented to the learning system. Taatgen and Anderson (2002) past-tense model avoided these problems and could model childrens behaviour without setting any limitations. Since this model, that its conception of procedural memory would really work, there have been a number of additional demonstrations with respect to learning from instruction. Figure 4.16 illustrates the four ways that the model posits children can generate past tenses: 1. Do nothing and use the present tense. Because this will almost always lead to failure there is a small utility. 2. Children attempt the instance-based strategies. This is a strategy most likely to work for words occurring with high frequency (irregular verbs mostly).

3. The child may not be able to retrieve a past tense, but rather retrieves a past tense of some other verb, either regular or irregular. Then he tries to adapt this form to the current situation, presumably by some analogy method. 4. Production compilation can apply to either method 2 and 3 to produce productions that just do it. In method 2 this would result in only a small improvement. For production 3 however it produces a rule that simply adds ed resulting in a major gain in efficiency. In the limit, production compilation produces a dual-route model in which the past tense can be produced by retrieving a special case rule or by using general rules. This will mostly happen for irregular verbs for they are more frequent. The reason these words are irregular is because over the many years that language evolved, it appeared to be more efficient to use the predominantly shorter words. Note that the Taatgen model does not make artificial assumption about frequency of exposure but instead learns given a presentation schedule of words like that actually encountered by children. Symbolically, it is learning general production rules and declarative representations of exceptions. Subsymbolically, it is learning the utilities of these production rules and the activation strengths of the declarative chunks. Beyond reproducing the U-shaped figure, the model explains why exceptions should be high-frequency words. First, only highfrequency words develop enough base-level activation to be retrieved. Moreover, it is phonologically more efficient and this promotes it adoption according to the utility calculations. Note also that the model receives no feedback on the past tenses it generates, unlike most models, but in apparent correspondence with the fact about child language learning. The amount of overgeneralisation displayed by the model is sensitive to the ratio of input it receives from the environment to its own part-tense generations. The sole fault of this model is that it does not progress beyond the past tense to deal with full language generation. The use of artificial boundaries is a strategy for dealing with the complexity of language learning, but eventually the field needs to go beyond such artificial constrained demonstrations. Language in general nicely illustrates the issue of deliberation versus action that opened this chapter.

Conclusion Read the book. Page 181.

Chapter 5: What Does It Take to Be Human?


Lessons From High School Algebra The last few chapters described many situations of how the (primate) mind can occur in the physical universe, and not the human mind per se. All the regions found in the human mind have close relatives in other animals. There is some missing link in the account so far of human intellectual function. One task that is uniquely human is algebra, but why is that?

Algebra as the Drosphila for the Study of Human Cognition In contrast to language, algebraic proficiency was not anticipated in human evolutionary history. Therefore the differences found will not rely on any evolutionary adaptations. Algebra is also a better choice methodologically for research. There are clear and formal characterisations of what the target competence is, because all prior knowledge can be restricted to middle-school mathematics. Also, the learners are of an age where they make cooperative participants. It is critical though the algebraic task is solvable by all and only humans. In America it appeared that some could not learn it, but when a program was set up to mandate algebra for all, everyone was able to master it in some degree. Anderson designed an intelligent tutoring system which was relatively successful and robust. However, there has also still not been any theoretical analysis of how algebra is learned and why these tutors lead to more success.

Mathematical Competence From a Comparative Perspective In order to know what is uniquely human considering algebra, we must consider first what we have in common with primates. Here is a list with some aspects: 1. Both human infants and primates have the ability to represent exact magnitudes up to 3 or 4. 2. Both human infants and primates appear to have an analogical numerical system that can represent larger quantities at least approximately. 3. Both humans and other primates are capable of spontaneously tracking the addition and subtraction of at least small numbers. None of the factors above needs training. 4. Both humans and other primates can be trained to compare quantities or to order them in increasing sequence. They also both show distant effects (further apart = lower RT).

5. Both humans and other primates can be trained to assign symbols to different quantities. It does appear, however, that training non-humans is far more difficult. The study of Nieder et al. (2002) is particularly compelling in its evidence that nonhuman primates represent specific quantities such as 3 or 4. The trained monkey were presented a certain quantity of dots, and had to compare that to a test stimulus. In the prefrontal cortex, there appeared to be neurons that responded differently to different number. The mathematical skills were maximally trained however and matched those of children before they enter school. This leaves them far short for algebraic competence, which requires far more computations. Algebra has the advantage of being so far from primate mathematics that there can be no doubt that it is uniquely human. It also relies less on the memorisation of additions/multiplications and can be though using calculators.

Learning a Socially Transmitted Competence Besides learning by following verbal directions or worked out examples, it is also possible to learn by discovery and invention, which is how it has been discovered. This is however a very inefficient way of learning. However, when one learns from directions and examples, one finds many mini discoveries by the students. The distinction between learning from verbal directions and examples is somewhat artificial, and usually the two are combined. Verbal transmission is restricted to linguistic enabled objects (humans), but this is not an explanation for superior human learning. Eventually, learning from worked-out examples appear to be better, and for the following reasons: 1. Verbal directions require understanding referring expressions. 2. Instruction typically describes tasks that involve multiple steps. Therefore, understanding the instruction also requires appropriately anticipating/imagining the intermediate state produced by a step, to know what comes next. On the other hand, learning through worked-out examples requires knowing to what degree one can generalize certain examples, and recognizing the appropriate situations. This might be the reason a combination of both is given. Because learning a worked-out example though does not require language, making it possible to compare different species. One example that has had considerable attention was learning through imitation. This was considered to be a human trait only and an sign for intelligence. Imitation that was found in primates was attributed to lesser factors. The quest for pure imitations has led to a rather dumbed-down version of imitation that requires exact

repetition of the actions without any inventive recombination. This was illustrated in the Plexiglas box task, and primates did show imitation in one task (push or twist) but not in the other (pull vs turn). It appeared that, where children almost slavishly imitate, the primates adopt their own choice. In any case, imitation is inadequate to explain how humans learn from worked-out examples in algebra. Consider the example ( x 3 = 8 ). In this case the child learns to add 3 on both sides to solve that. When another equation is given however, according to the current verbatim repetition definition, the child will again add 3, not coming any closer to the answer. Children are quite capable of not doing so, and are able to use the correct analogy. This analogy goes beyond simple mapping of elements, but also learns the relations between those elements. Of course, typicall instruction does not depend entirely on the childs ability to identify the relations; it also includes verbal directions pointing to the correct generalisations. However, whether verbal or by example, successful learning of algebra depends on formulating and using relatively complex generalisations. This comes at a later age. In the following, three ACT-R models will be discussed that try to explain algebra learning, and in turn, what are unique human traits. Firstly, the potential for abstract control. Secondly, the capacity for advanced pattern matching, and thirdly, the metacognitive ability to reason about mental cognitive states (which hasnt had a lot of attention).

A Laboratory study of Algebra Leaning: Abstract Control The first task is of the study of Qin et al. (2004). To review, young children who had not yet mastered linear equation solving were given a 1 hour introduction, and 5 days of practice. Much regions reacted to task complexity and practice effect, except for the ACC, who did not show any practice effect. The need for abstract control does not change over the course of the experiment. The instruction-following model starts from an internal declarative representation of the instructions given to the children in this task. It could thus not parse real instructions into declarative chunks. Once instructions are presented, ACT-R has interpretative productions for converting these instructions into behaviour. The instruction is decomposed into actions that the participant already knows how to do: reading numbers, performing addition, keying a number etc. Processing of the instructions engages a pattern of activity involving the declarative module, the procedural module and some peripheral modules. Eventually, productions are combined as a result of collapsing multiple steps into a single rule, which is the essence of learning in this task.

Comparison With Primate Sequential Symbol Behaviour The few abstract bits of information in this task had to be communicated compactly which was done using the goal module. This enables the participant to carry forward the correct lines of though without any external support. The need for these control states can be showed in a task which primates can perform as well. Sequential processing is a task that primates are rather good at. In a picture sorting task primates scored almost as good as human would do in such circumstances. They can even sort two pictures that were never seen before, indicating that they bear certain superficial similarities to algebra symbol manipulations, just like a child and an equation. While the performance is in many remarkable, there are still major differences between manipulating lists and transforming algebraic equations. One of them is the childs ability to generalise the equations, which is more than just showing different behaviour. They were able to use the mental structures to equation solving. The challenge still stands however to identify what in the architecture is associated with these differences between children and primates. Being able to hold onto such an internal representation, detached from either stimulus or action, is critical to the models algebraic competence. It is easy to point to the parietal (imaginal) region for this, for there is not an exact homologue in the monkey brain. The ability to hold such intermediate results is not totally discontinues from the ability of the monkey. This becomes apparent when one tries to develop an ACT-R model for the monkey task. The model assumes that the monkey retrieves the relative location of the items, and by creating an image, picks the one that is first. When comparing the two models, the monkey-task model does not require any state tests against the goal buffer. Because of the iterative nature of the algebra algorithm though, it is not possible to find unique states of the nongoal modules for each production in the model. Thanks to the ACC, humans can maintain abstract control states that allow the human to take different courses of action when all the other buffers are in identical states.

Learning to Solve Linear Equations: Dynamic Pattern Matching In the second task children were learned algebra in the standard linear form, and adult working with a data-flow isomorph to algebra that was developed. With a combination of clicking and typing the participants could give their answers. The first steps of the task was to indicate what part of the equation is being transformed. Though verbal instruction were parsed into an internal declarative representation, they were rather ad hoc and not very

plausible in vivo. Actual general directions proved to be impossible to be parsed, exposing a limitation in ACT-R. In general, children made more errors than adults did and took much longer as well. They had no deep misunderstanding of the tasks, rather they were just prone to slips. When looking at the normalised results the children and adults performed almost identical. The model predicts this because it responds to the abstract structure of the problems that is common for the linear and the data-flow representation. The results of the model is largely a result of the subsymbolic processes. The declarative representations of the instructions will be learned gradually resulting in the long time costs the beginning. Then the model will reach its maximum speed when is has compiled to the most compact rules possible. The most important outcome of this, was that the model could learn from this instruction at all, which required a significant extention to the pattern-matching capabilities. The new feature that had to be added was called dynamic pattern matching. With dynamic pattern matching you can refer to a particular expression, and use this expression for further productions. It is like static matching, in which the content of an expression can be placed rather dynamically. The difference is thus that you can map different terms, which can be useful when, for example, the expressions are on the other side of the operator than normally. Without it, there is no tractable way to maintain the correspondence between the terms. It is interesting because, in contrast to static pattern matching, it cannot be implemented by simple neural pathways. In contrast, the regions from or to which the information is being moved are being determined dynamically. With respect to the brain, one does not know ahead of time which brain region will be needed as the source or destination of the information. When the particular role for a term has been determined, a gateway opens for the other terms to the appropriate regions. The capacity for gating is theorised to involve prefrontal structures. In summary, to follow instruction, humans need general rules that are able to process the relational structure that appears in these instructions. Therefore, dynamic pattern matching joins the ability to exercise abstract cognitive control as another architectural feature distinguishing humans from other primates.

Mastery of an Algebraic Concept: Does Metacognition Require Special Architectural Features? In unpublished work it appeared that students use metacognition (knowing about knowing, learning about learning, cognition of cognition) in learning algebraic skills. While Soar is

advertised as being specially designed for metacognition, it appears that ACT-R has the capability to do so as well, using both abstract control and dynamic pattern matching.

This investigation took place in an algebra domain that is invented called pyramidal problems. In these problems you are given a base value x, and the height y, and will look like this (x$y). The height represents how many values have to be added, being the previous number minus 1. E.g. 5$2 = 5 + 4 + 3 = 12. (This looks like in linear algebra. ) The

pyramidal problem was designed as an analogy to powers and exponents, because prior knowledge is hard to control. From all problems, initial ones were evaluation problems; later problems require using the knowledge in different ways. Two ACT-R models were built. An old one using standard linear equation solving skills, and a new one that could parse verbal instructions. The new model was better due to the set of metacognitive abilities. One problem in particular is interesting, namely 1000$2000. With a height of 2000 it seems impossible to solve it using the traditional way just as many students reported. Some students tried a factorial approach, which was wrong. Some reasoned through a simpler problem like 2$4, and others used abstract reasoning, making them notice that the positive and negative cancel each other out. The new ACT-R model tried factorial first, and then abstract reasoning before confirming the answer by solving 2$4. The first challenge raised by this problem is recognizing the difficulty and interrupting the normal process. You (and the model) start with normal processing of the base, before reaching the height. The rule then simply notes if the height is too high, to possibly divert to another strategy. This requires that we represent states of intention in the same terms that we represent external (possible conflicting) stimuli (monkey tiger analogy). This is the abstract control that is achieved by the goal module The too-much comparison, however, also involves DPM: it takes the identity of the iterative bound and looks in that location to find the value of the iterative bound. The second architectural challenge raised by this problem is maintaining multiple states of mind. At a particular point in the task you will have three substates: 1. The top goal is waiting on the check with the simple problem. It is holding a pointer to an image of the hypothesises answer from state 2. 2. It has already processed the subgoal of abstract addition and has an image of that result. 3. It is currently focussed on the goal of solving 2$4 = 0 and can now check with the parent. Each state of these states are maintained in declarative memory, and successful performance at this point depends on being able to retrieve the parent goal to perform the check.

Essentially, the capacity to represent the states depends on the fact that chunks can contain pointers to other chunks. This allows hierarchical structures and other more complex ones to be represented.

The previous mentioned Soar model solved this problem in basically the same way, which is interesting because Soar was specially built for metacognition. At each step, Soar has the capacity to deliberate on what to do next, whereas ACT-R just fires the next production. A production could respond to the current state and redirect cognition, and the architectural primitive in ACT-R for storing buffer contents gave it the information it needed to relate different states of mind. While metacognition does not seem to require any additional architectural primitives, it does depend on the two architectural features of abstract control through the goal module and DPM in the procedural module.

Final Reflections Read the book. Page 232 and 233.

How Can the Human Mind Occur?


The Answer So Far Each of the preceding chapters offers a piece of the answer: 1. The answer takes the form of a cognitive architecture i.e., the specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind. 2. For reasons of efficiency of neural computations, the human cognitive architecture takes the form of a s set of largely independent modules associated with different brain regions. 3. Human identity is achieved through a declarative memory module that attempts to give each person the most appropriate possible window into his/her past. 4. The various modules are coordinated by a central production system that strives to develop a set of productions that will give the most adaptive response to any state of the modules. 5. The human mind evolved out of the primate mind by achieving the ability to exercise abstract control over cognition and the ability to process complex relation patterns. This is however just a summary, whilst Newell stated that the answer must have the details. There is the unfortunate tendency to view the progress summarized in this book as a growth in the success of ACT-R rather than a growth in knowledge about the human mind. The brand names tend to make difficult the analysis and comparison of these mechanisms or the exchange of knowledge between research group.

The Newell Criteria Anderson and Lebiere (2003) introduced the Newell test as a way of both identifying progress toward a satisfactory knowledge of the human mind and seeing where more work was needed. It consists of 12 criteria based on two lists published by Newell himself (1980,1990). 1. Behave as an (almost) arbitrary function of the environment: Being able to behave with great flexibility, requiring DPM and the ability to guide our cognition independent of our environment. 2. Operate in real time: Beside actually working, it must work in the same time as humans live and operate in.

3. Exhibit rational, that is, effective adaptive behaviour: Humans do not just perform intellectual function, but instead perform them to serve their needs. This involves the sumsymbolic computations and how humans reflect on the adaptiveness of the actions. 4. Use vast amounts of knowledge about the environment: One key to human adaptivity is the vast amount of knowledge that can be called upon. How this database is maintained is still a large question and has not been answered by any architectural model. 5. Behave robustly in the face of error, the unexpected, and the unknown: The world can change in unexpected ways, and how we act upon it can have unexpected effects. The ability to deal with a dynamic and unpredictable environment is a precondition to survival for all organisms. 6. Integrate diverse knowledge: The ability to bring distal knowledge to come up with novel conclusions. Fodor (chapt 2) thinks this is not possible but has been implemented in ACT-R and Soar. 7. Use (natural) language: This is a topic this book neglected, but not by ACT-R. One big issue is the degree to which there is special neural support for language processing. There are often general-purpose processing mechanisms proposed, which Anderson characterises as subsymbolic. 8. Exhibit self-awareness and a sense of self: Newell connected this most with the topic of consciousness. 9. Learning from environment: One striking feature of the brain is that almost all of its tissue is capable of changing with experience. Declarative and procedural memory combine to produce complex learning. It is an interesting question whether there are useful higher level characterisations of perceptual and motor learning. 10. Acquire capabilities through development: While there are lots of learning models in different architectures that address specific events in development, we lack a real grip on the changes that occur with development. 11. Arise through evolutions: If the scale of development remains overwhelming, the more so for evolution, compounded by the real lack of data. 12. Be realisable within the brain: Perhaps the biggest contribution to the understanding of the neural bases of cognition, is showing how to relate detailed cognitive models to detailed brain imaging data.

The Question of Consciousness

In venturing into the question of consciousness, it is not possible to, unlike Newells statement, to leave the philosophers domain. Consciousness refers to our sense of awareness of our own cognitive workings. Consciousness has an obvious mapping to the buffers that are associated with the modules. The contents of consciousness are the contents of these buffers, and conscious activity corresponds to the manipulation of the contents of these buffers by production rules. ACT-R is not conscious like most humans are, which is probably because ACT-R gives a rather incomplete picture of the buffers that are available. Moreover, typical ACT-R models also give rather incomplete pictures of the operations on the buffers. Still, identifying ACT-R buffers with consciousness is a significant conclusion because it is very much caught up in the popular conception of the mind. This is not a particularly novel interpretation of consciousness, but basically the ACT-R realisation of the global workspace theory of consciousness. From this point of view of ACT-R, our knowledge is not fundamentally different than our knowledge of our internal workings. Many will find the identification of consciousness with the contents of the buffers in ACT-R to be problematic. Every computational system holding information would therefore be conscious. Moreover, with information that flits in and out the buffers, you can wonder if this is perceived consciously. Just because something is in a buffer, does not mean it will be reported. Only that it could be reported if the right productions are called upon. These are products of powerful modules that have been shaped by evolution to produce functionally adaptive results that can work with each other. Probably more bothersome than the claim that everything in the buffers is conscious, is the claim that there is nothing more to consciousness. Dennett argues, that conscious thoughts are, at best, momentarily noted. He states that we have been so ingrained to think in terms of something like a Cartesian theatre (where thing come to presentation of the mind), that we find it hard to think in terms of something like this buffer theory. There has been some criticism. However, our phenomenal conscious experience is just the exercise of our ability to acces and reflect on the contents of our buffers. If we resist the temptation to believe in a hard problem of consciousness, we can appreciate how consciousness is the solution of the fundamental problem of achieving the mind in the brain. Thus, consciousness is the manifestation of the solution to the need for global coordination among modules. That being said, chapter 1-5 develop this architecture with only oblique references to consciousness. It is still not clear how invoking the concept of consciousness adds to the understanding of the human mind, but taking a coherent reading of the term consciousness, I am willing to declare ACT-R conscious.

You might also like