Fourteen Lessons from the Physics Education Reform Effort
ABSTRACT
Several years ago, I reported a survey of pretest and posttest data for 62 introductory physics courses attended by a total of 6542 students. The present article provides a summary of that survey and offers 14 lessons from the physics education reform effort that may assist in the general upgrading of education and science literacy.
KEY WORDS: cognitive science, education reform, education research, interactive engagement, physics education, science literacy.
Published: January 14, 2002
INTRODUCTION
For more than three decades, physics education researchers have repeatedly shown that traditional introductory physics courses with passive student lectures, recipe labs, and algorithmic problem exams are of limited value in enhancing students' conceptual understanding of the subject (McDermott and Redish 1999). Unfortunately, this work was largely ignored by the physics and education communities until Halloun and Hestenes (1985a, 1985b) devised the Mechanics Diagnostic (MD) test of conceptual understanding of Newtonian mechanics. Among many other virtues, the MD and the subsequent Force Concept Inventory (FCI) (Hestenes et al. 1992, Halloun et al. 1995) tests have two major advantages: (a) the multiple-choice format facilitates relatively easy administration of the tests to thousands of students; (b) the questions probe for a conceptual understanding of the basic concepts of Newtonian mechanics in a way that is understandable to the novice who has never taken a physics course (and thus can be given as an introductory course pretest), yet at the same time are rigorous enough for the initiate.
Here is a sample question, similar to those found in a Halloun and Hestenes (HH) test (to preserve confidentiality, actual HH questions are not used):
A student in a lab holds a brick of weight W in her outstretched horizontal palm and lifts the brick vertically upward at a constant speed. While the brick is moving vertically upward at a constant speed, the magnitude of the force on the brick by the student's hand is:
A. constant in time and zero.
B. constant in time, greater than zero, but less than W.
C. constant in time and W.
D. constant in time and greater than W.
E. decreasing in time but always greater than W.
 
Note that the responses include as distractors not only "D," the common Aristotelian misconception that "motion requires a net force," but also other, less common student misconceptions, "A" and "E," that might not be known to traditional teachers. Unfortunately, too few teachers "shut up and listen to their students" to find out what they are thinking (Arons 1981). The distractors are based on my years of listening to students as they worked through the experiments in Socratic Dialogue Inducing Lab #1 "Newton's First and Third Laws" (Hake 2001a). For actual HH questions, the distractors were usually gleaned through careful qualitative research involving interviews with students and the analysis of their oral and written responses to mechanics questions.
Using the MD test, Halloun and Hestenes (1985a, 1985b) published a careful study using massive pre- and postcourse testing of students at Arizona State University in introductory physics courses that both were and were not based on calculus. They concluded that: (1) " ... the student's initial qualitative, common sense beliefs about motion and ... (its) ... causes have a large effect on performance in physics, but conventional instruction induces only a small change in those beliefs ... " and (2) " ... considering the wide differences in the teaching styles of the four professors ... (involved in the study) ... the basic knowledge gain under conventional instruction is essentially independent of the professor."
These outcomes were consistent with work done prior to the HH study as recently reviewed by McDermott and Redish (1999).
The HH results stimulated a flurry of research and development aimed at improving introductory mechanics courses. Most of the courses so generated sought to promote conceptual understanding through the use of pedagogical methods published by physics education researchers (see, e.g., Physical Science Resource Center 2001, Galileo Project 2001, and UMd-PERG 2001a). These methods are usually based on the insights of cognitive science (Gardner 1985, Mestre and Touger 1989, Redish 1994, Bruer 1994, 1997, Bransford et al. 1999, Donovan et al. 1999) and/or outstanding classroom teachers (e.g., Karplus 1977, 2001, Minstrell 1989, Arons 1990, McDermott 1991, 1993, Fuller 1993, Reif 1995, Wells et al. 1995, Zollman 1996, Laws 1997). Although the methods differ in detail, they all attempt to guide students to construct their understanding by heads-on (always) and hands-on (usually) activities that yield immediate feedback through discussion with peers and/or instructors [Interactive Engagement (IE)], so as to finally arrive at the viewpoint of the professional physicist.
The survey summarized below documents some of the successes and failures of courses using IE methods. Hopefully, these findings will lead to much-needed further improvement in introductory mechanics instruction in the light of practical experience, and serve as a model for promoting educational reform in other disciplines. As the summary omits some important aspects, serious education scholars are urged to consult the original sources (Hake 1998a, 1998b, 1998c]. I then present 14 somewhat subjective lessons from my own interpretation of the physics education reform effort with the hope that they may assist the general upgrading of education and science literacy.
SURVEY SUMMARY
Starting in 1992, I requested that pre- and post-FCI data and posttest Mechanics Baseline (a problem-solving test due to Hestenes and Wells (1992)) data be sent to me. Because instructors are more likely to report higher-gain courses, the detector is biased in favor of those courses. However, it can still answer a crucial research question: can IE methods increase the effectiveness of introductory mechanics courses well beyond that obtained by traditional methods?
The data
Figure 1 shows data from a survey (Hake 1998a, 1998b, 1998c) of 62 introductory physics courses, enrolling a total of 6542 students. The data are derived from pretest and posttest scores of the MD and FCI tests indicated above and acknowledged to be of high validity and consistent reliability (for the technical meanings of these terms, see, e.g., Light et al. 1990, Slavin 1992, Beichner 1994). Average pretest and posttest scores, standard deviations, instructional methods, materials used, institutions, and instructors for each of the survey courses are tabulated and referenced in Hake (1998b). The latter paper also gives case histories for the seven IE courses whose effectiveness, as gauged by pre- to posttest gains, was close to those of traditional courses, advice for implementing IE methods, and suggestions for further research. Various criticisms of the survey and of physics education research generally are countered by Hake (1998c).
| Fig. 1.   The %<Gain> vs. %<Pretest> score for 62 courses, enrolling a total of 6542 students.  Here, %<Gain> = %<posttest> – %<pretest>, where the angle brackets “<....>” indicate an "average" over all students in the course.  Points for high school (HS), college (COLL), and university (UNIV) courses are shown in green for Interactive Engagement (IE) and in red for Traditional (T) courses. The straight negative-slope lines are lines of constant "average normalized gain" <g>. The two dashed purple lines  show that most IE courses achieved <g>’s between 0.34 and 0.69. The definition of <g>, and its justification as an index of course effectiveness, is discussed in the text.  The average of <g>’s for the 48 IE courses is <<g>> 48IE = 0.48 ± 0.14 (standard deviation) while the average of <g>’s for the 14 T courses is <<g>> 14T = 0.23 ± 0.04 (sd).  Here, the double angle brackets “<<....>>” indicate an "average of averages." (Same data points and scales as in Fig. 1 of Hake 1998a.) 
   | 
For survey classification and analysis purposes, I operationally defined:
- "IE methods" as those designed, at least in part, to promote conceptual understanding through interactive engagement of students in heads-on (always) and hands-on (usually) activities that yield immediate feedback through discussion with peers and/or instructors, all as judged by their literature descriptions;
 
- "IE courses" as those reported by instructors to make substantial use of IE methods; and
	 
- "Traditional (T) courses" as those reported by instructors to make little or no use of IE methods, relying primarily on passive-student lectures, recipe labs, and algorithmic-problem exams.
Average normalized gain
In the survey (Hake 1998a, 1998b, 1998c), it was useful to discuss the data in terms of a quantity that I called the "average normalized gain" <g>, defined as the actual gain, %<Gain>, divided by the maximum possible actual gain, %<Gain> max:
| <g> = %<Gain> / %<Gain>max | (1a) | 
| <g> = (%<posttest>–%<pretest>) / (100 – %<pretest>) | (1b) | 
where %<posttest> and %<pretest> are the final (posttest) and initial (pretest) class percentage averages.
For example, suppose that for a given class the test average before instruction was %<pretest> = 44%, and the test average after instruction was %<posttest> = 63%. Then the percentage average actual gain is
%<Gain> = 63% – 44% = 19%.
The maximum possible actual gain for this class would have been
%<Gain>max = (100% – 44%) = 56%.
Thus, for this example, the average normalized gain is
<g> = %<Gain> / %<Gain> max = 19%/56% = 0.34,
that is, the class made an average gain of 34% of the maximum possible average gain.
To understand the graphical interpretation of the "average normalized gain" <g>, consider the same example as above. Data for that class would be plotted in Fig. 1 as the point [%<pretest> = 44%, %<Gain> = 19%] at the tip of the white arrowhead. This point has an abscissa (100% - 44%) = 56% and ordinate 19%. The absolute value of the slope "s" of the purple dashed line connecting this point to the lower right vertex of the graph is |s| = ordinate/abscissa = 19%/56% = 0.34. Thus, this absolute slope
|s| = %<Gain>/(100% - %<pretest>) = %<Gain>/ (maximum possible %<Gain>) = %<Gain>/ %<Gain> max
is, of course, just the "average normalized gain" <g>. That <g> can be taken to be an index of that course's effectiveness is justified below in Conclusions of the Survey. Thus, all courses with points close to the lower purple dashed line are judged to be of about equal average effectiveness, regardless of their average pretest scores. A similar calculation for the point [%<pretest> = 32%, %<Gain> = 47%] at the tip of the blue arrowhead yields <g> = 0.69. The maximum value of <g> occurs when %<Gain> is equal to %<Gain> max and is therefore 1.00, as shown in Fig. 1.
Popular interactive engagement methods
For the 48 IE courses in Fig. 1 and Fig. 2, the ranking in terms of number of IE courses using each of the more popular methods is as follows:
- Collaborative Peer Instruction (Johnson et al. 1991, Heller and Hollabaugh 1992, Heller et al. 1992, Slavin 1995, Johnson et al. 2000): 48 (all courses) {CA} - for the meaning of "CA" and similar abbreviations below enclosed in braces "{....}", see the paragraph following this list.
 
- Microcomputer-based Labs (Thornton and Sokoloff 1990, 1998): 35 courses {DT}.
 
- Concept Tests (Mazur 1997, Crouch and Mazur 2001): 20 courses  {DT}. Tests of this type for physics, biology, and chemistry are available on the Web, along with a description of the Peer Instruction method at the Galileo Project (2001).
 
- Modeling (Halloun and Hestenes 1987, Hestenes 1987, 1992, Wells et al. 1995): 19 courses {DT + CA}; these courses are described on the Web at http://modeling.la.asu.edu/.
 
- Active Learning Problem Sets or overview case studies (Van Heuvelen 1991a, 1991b, 1995): 17 courses {CA}. Information on these materials is online at http://www.physics.ohio-state.edu/~physedu/.
 
- Physics education research based text (Hake 1998b, Table II) or no text: 13 courses.
 
- Socratic Dialogue Inducing Labs (Hake 1987, 1991, 1992, 2001a, Tobias and Hake 1988): 9 courses {DT + CA}. A description and lab manuals are on the Web at the Galileo Project (2001) and http://www.physics.indiana.edu/~sdi.
The notations within the braces {....} follow Heller (1999) in loosely associating the methods with "learning theories" from cognitive science. Here "DT" stands for "Developmental Theory," originating with Piaget (Inhelder and Piaget 1958, Gardner 1985, Inhelder et al. 1987, Phillips and Soltis 1998), and CA stands for "Cognitive Apprenticephip" (Collins et al. 1989, Brown et al. 1989). All these methods, except #6, recognize the important role of social interactions in learning (Vygotsky 1978, Lave and Wenger 1991, Dewey 1997, Phillips and Soltis 1998). It should be emphasized that the above rankings are by popularity within the survey, and have no necessary connection with the effectiveness of the methods relative to one another. In fact, it is quite possible that some of the less popular methods used in some survey courses, as listed by Hake (1998b), could be more effective in terms of promoting student understanding than any of the above popular strategies.
| Fig. 2.   Histogram of the average normalized gain <g>: red bars show the fraction of 14 Traditional (T) courses (2108 students) and green bars show the fraction of 48 Interactive Engagement (IE) courses (4458 students), both within bins of width <g> = 0.04, centered on the <g> values shown. (Same as Fig. 2 of Hake 1998a.) 
   | 
CONCLUSIONS OF THE SURVEY
The conclusions of the survey (Hake 1998a, 1998b, 1998c, 1999a) may be summarized as follows.
- The average normalized gain <g> affords a consistent analysis of pretest and posttest data on conceptual understanding over diverse student populations in high schools, colleges, and universities. For the 62 courses of the survey (Hake 1998a, 1998b, 1998c), the correlation of
 
| <g> with (%<pretest>) is + 0.02. | (2) |  
 
This constitutes an experimental justification for the use of <g> as a comparative measure of course effectiveness over diverse student populations with widely varying average pretest scores, and is a reflection of the usually relatively
small correlations of single student <g>'s with their pretest scores within a given class (Hake 1998a, 2001b, Cummings et al. 1999, Meltzer 2001).
 
The average posttest score (%<posttest>) and the average actual gain (%<Gain>) are less suitable for comparing course effectiveness over diverse groups because their correlations with (%<pretest>) are significant. The correlation of
 
 
| (%<posttest>) with (%<pretest>) is + 0.55, | (3) |  
  and the correlation of
 
 
| (%<Gain>) with %<pretest> is – 0.49. | (4) |  
 
Both of these correlations would be anticipated. Note that, in the absence of instruction, a high positive correlation of (%<posttest>) with (%<pretest>) would be expected. The successful use of the normalized gain for the analysis of pretest and posttest data in this and other physics education research (see Can Educational Research Be Scientific Research?) calls into question the commonly dour appraisals of pretest and posttest designs (Lord 1956, 1958, Cronbach and Furby 1970, Cook and Campbell 1979). For a review of the pretest and posttest literature (both for and against), see Wittmann (1997).
 
 
- Fourteen T courses (2084 students) surveyed yielded
 
| <<g>>14T = 0.23 ± 0.04sd | (5) |  
 
Considering the elemental nature of the MD/FCI questions (many physics teachers regard them as too easy to be used on examinations) and the relatively low <g> = 0.23 (i.e., only 23% of the possible gain was achieved), it appears that T courses fail to convey much basic conceptual understanding of Newtonian mechanics to the average student.
 
  
- The 48 IE courses (4458 students) surveyed yielded
 
| <<g>>48IE = 0.48 ± 0.14sd | (6) |  
 The <<g>>48IE is more than twice that of <<g>>14T. The difference (<<g>>48IE - <<g>>14T ) is more than 6 standard deviations (SD's) of <<g>>14T and almost 2 SD's of <<g>>48IE, reminiscent of differences seen when comparing instruction delivered to students in large groups with one-on-one instruction (Bloom 1984). This suggests that IE courses can be much more effective than T courses in enhancing conceptual understanding of Newtonian mechanics. Although it was not possible in this survey to assign students randomly from a single large homogeneous population to the T and IE courses, contrasting T and IE data are all drawn from the same institutions and the same generic introductory program regimes (Hake 1998b). Thus, it seems very unlikely that the nearly 2-SD difference between <<g>>'s for the IE and T courses could be accounted for by differences in the student populations.
 
An alert critic of an early draft (and more recently Becker 2001a, see  below) pointed out that the <<g>> difference might be due in part to the apparent smaller average enrollment for IE courses (4458/48 = 93) than for T courses (2084/14 = 149). However, such calculation of average class size is invalid, because, in several cases (Hake 1998b, Table I), classes of fairly homogeneous instruction and student population were combined into one "course" whose <g> was calculated as a number-of-student-weighted average of the <g>'s of the individual classes. A correct calculation yields an average class enrollment of 4458/63 = 71 for IE classes and 2084/34 = 61 for T classes, so the average class sizes are quite similar.
 
 
- A detailed analysis of random and systematic errors has been carried out (Hake 1998a) but will not be repeated here. Possible systematic errors considered were question ambiguities, isolated false positives (right answers for the wrong reasons), and uncontrolled variables in the testing conditions, such as teaching to the test and test-question leaks, the fraction of course time spent on mechanics, posttest and pretest motivation of students, and the Hawthorne/John Henry effects. It was concluded that it is extremely unlikely that random or systematic error plays a significant role in the almost 2-SD difference in the <<g>>'s of T and IE courses.
 
- Conclusions 1-3 above are bolstered by an analysis (Hake 1999a) of the survey data in terms of Cohen's (1988) "effect size" deviation. The effect size is commonly used in meta-analyses (e.g., Light et al. 1990, Hunt 1997, Glass 2000), and strongly recommended by many psychologists (Thompson 1996, 1998, 2000) and biologists (Johnson 1999, Anderson et al. 2000, Thompson 2001) as a preferred alternative (or at least addition) to the usually inappropriate (Rozeboom 1960, Carver 1993, Cohen 1994, Kirk 1996) t-tests and p values associated with null-hypothesis testing.
Carver (1993) subjected the Michelson and Morley (1887) data to a simple analysis of variance (ANOVA) and found statistical significance associated with the direction the light was traveling (p < 0.001 )! He writes, "It is interesting to speculate how the course of history might have changed if Michelson and Morley had been trained to use this corrupt form of the scientific method, that is, testing the null hypothesis first. They might have concluded that there was evidence of significant differences in the speed of light associated with its direction and that therefore there was evidence for the luminiferous ether ... Fortunately Michelson and Morley ... [first] ... interpreted their data with respect to their research hypothesis." (My italics.) Consistent with the scientific methodology of physical scientists such as Michelson and Morley (see Can Educational Research Be Scientific Research?), Rozeboom (1960) wrote that " ... the primary aim of a scientific experiment is not to precipitate decisions, but to make an appropriate adjustment in the degree to which one accepts, or believes, the hypothesis or hypotheses being tested." (See also Anderson 1998.)
 
The effect size deviation (d) is defined by Cohen (1988:20, 44) as
 
 
| d = |mA - mB| / [(SDA2+SDB2)/2]0.5 | (7) |  
 
where mA and mB are population means expressed in the raw (original measurement) unit, and where the denominator is the root mean square of standard deviations for the A and B group means, sometimes called the "pooled standard deviation." For the present survey, Eq. (7) becomes
 
 
| d = [<<g>>48IE - <<g>>14T]/[(SD 48IE2+ SD14T2)/2]0.5 | (8) |  
 
Insertion of the measured values <<g>>14T = 0.23 ± 0.04SD and <<g>>48IE = 0.48 ± 0.14SD into Eq. (8) yields:
 
 
| d = [(0.48) - (0.23)]/[(0.142 + 0.042 )/2]0.5 = 2.43 | (9) |  
 
The above "d" can be compared with:
 
- (a)
- Cohen's (1988:24) rule of thumb, which is based on typical results in social science research, i.e., that d = 0.2, 0.5, 0.8 implies respectively "small," "medium," and "large" effects. However, Cohen cautions that the adjectives " ... are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation."
- (b)
- The course-enrollment N-weighted <d> = 0.57 obtained for 31 test/control group studies (2559 students) of achievement by Springer et al. (1999:Table 2) in a meta-analysis of the effect of small-group learning among undergraduates in science, math, and engineering.
 
However, the effect size deviation of the present study is much larger than might be expected on the basis of Cohen's rule of thumb, or on the basis of the results of Springer et al. This difference may be related to the facts that, in this survey, unlike most education research meta-analyses (e.g., those of Springer et al. 1999, Slavin 1995, and Johnson et al. 2000):
 
- (1)
- all the courses covered nearly the same material (here introductory Newtonian mechanics);
- (2)
- the material is conceptually difficult and counterintuitive;
- (3)
- the same test (either MD or FCI) was administered to both IE and T classes;
- (4)
- the tests used are widely recognized for their validity and consistent reliability, have been carefully designed to measure understanding of the key concepts of the material, and are far superior to the plug-in-regurgitation types of test so commonly used as measures of "achievement";
- (5)
- the measurement unit gauges the normalized learning gain from start to finish of a course, not the "achievement" at the end of a course;
- (6)
- the measurement unit <g> is not significantly correlated with the student's initial knowledge of the material being tested; and
- (7)
- the "treatments" are all patterned after those published by education researchers in the discipline being tested. I think that the Springer et al. meta-analysis probably understates the potential of small-group learning for advancing conceptual understanding and problem-solving ability.
 
For what it is worth, the conventional t-test (Slavin 1992:157-162, Snedecor and Cochran 1989:83-102), which assumes normal distributions unlike those presently observed (Fig. 2), yields t = 11 and p < 0.001. Paraphrasing Thompson (1996), p is the two-tailed probability (0 to 1.0) of the results (means difference and the SD), given the sample sizes and assuming that the samples were derived from a population in which the null hypothesis H0 is exactly true, i.e., the probability of the results assuming that H0 is true. For discussions of the common misinterpretation of p as the converse (the probability that H0 is true given the results), see Cohen (1994) and Kirk (1996). Because the variances of the <g> distributions for the IE and T courses are markedly dissimilar (F = SDIE2/SDT2 = 12), I have used an approximation to the standard t-test due to Satterthwaite (1946), as discussed by Snedecor and Cochran (1989:96-98) and by Rosenthal et al. (2000:33-35).]
 
An effect size deviation can also be calculated directly for each pretest and posttest study in accord with Eq. (10):
 
 
| d = (%<post> - %<pre>)/[(SDpre2+ SDpost2)/2]0.5 | (10) |  
 
However, it should be noted that for pretest and posttest comparisons of course effectiveness over diverse student populations with widely varying average pretest scores, "d" is a relatively poor metric because, unlike "<g>":
 
 - (a)
- "d" depends on the actual bare (unrenormalized) average %<Gain>, whose magnitude, as indicated in Eq. (4), tends to be negatively correlated with the %<pretest>;
- (b)
- %<Gain>'s are confounded with SD's: given two classes both with identical statistically significant %<Gain>'s, the more homogeneous class with the smaller (SD pre2+ SD post2) will be awarded the higher "d."
 
Ignoring these problems, for the 33 courses of the survey for which SD's are available (Hake 1998b), I obtain average effect sizes:
 <d>9T = 0.88 ± 0.32SD for 9T courses (1620 students) with <<g>>9T = 0.24 ± 0.03SD; and
<d>24IE = 2.16 ± 0.75SD for 24 IE courses (1843 students) with <<g>>24IE = 0.50 ± 0.12SD. 
 
Weighting the d and g averages, as in the Springer et al. (1999) study, in accord with class enrollments N (on the grounds that d and <g> from larger classes more closely approximate actual effects in very large student populations) yields:
 
 <d>9T-Nweighted = 1.05 for 9T courses (1620 students) with <<g>>9T-Nweighted = 0.26, and
<d>24IE-Nwighted = 2.16 for 24 IE courses (1843 students) with <<g>>24IE-Nweighted = 0.52, 
 
not greatly different from the non-weighted averages.
 
The present <d>24IE = 2.16 can be compared with: (1) the similar d1IE = 1.91 (with <g>1IE = 0.48) reported by Zeilik et al. (1998) for a single IE introductory astronomy course (331 students) given in Spring 1995 at the University of New Mexico, and (2) the much smaller course-enrollment N-weighted average <d> = 0.30 obtained for six pre/post studies (764 students) of achievement by Springer et al. (1999, Table 2).
 
As noted in Conclusion 3, above, a critic of an early draft pointed out that <d> difference might be due in part to the apparent smaller average enrollment for IE courses (1843/24 = 77) than for T courses (1620/9 = 169). However, such calculation of average class size is invalid for the reason given previously. A correct calculation of average class size indicates an average class enrollment of 1843/39 = 47 for IE classes and 1620/31 = 52 for T classes, so the average class enrollments are quite similar.
 
 
- Considering the elemental nature of the MD/FCI tests, current IE methods and their implementation need to be improved, as none of the IE courses achieves <g> greater than 0.69. In fact, as can be seen in Fig. 1 and Fig. 2, seven of the IE courses (717 students) achieved <g>'s close to those of the T courses. Case histories of the seven low-<g> courses (Hake 1998b) suggest that implementation problems occurred that might be mitigated by:
 - (1)
- apprenticeship education of instructors new to IE methods;
- (2)
- emphasis on the nature of science and learning throughout the course;
- (3)
- careful attention to motivational factors and the provision of grade incentives for taking IE-activities seriously;
- (4)
- recognition of and positive intervention for potential low-gain students;
- (5)
- administration of exams in which a substantial number of the questions probe the degree of conceptual understanding induced by the IE methods; and
- (6)
- use of IE methods in all components of a course and tight integration of those components.
 
Personal experience with the Indiana IE courses and communications with most of the IE instructors in the survey suggest that similar implementation difficulties probably occurred to a greater or lesser extent in all the IE courses and are probably partially responsible for the wide spread in the <g>'s, apparent for IE courses in Fig. 1 and Fig. 2.
 
 
- I have plotted (Hake 1998a) average post-course scores on the problem-solving Mechanics Baseline test (Hestenes and Wells 1992) [available for 30 (3259 students) of the 62 courses of the survey] vs. those on the conceptual FCI. There is a very strong positive correlation r = + 0.91 of the MB and FCI scores. This correlation and the comparison of IE and T courses at the same institution (Hake 1998a) imply that IE methods enhance problem-solving ability.
CRITICISMS OF THE SURVEY
Early criticisms of the survey have been countered in Hake (1998c). William Becker (2001a), an authority on economics contributions to the assessment of educational research (Becker and Baumol 1995), and executive editor of the Journal of Economic Education http://www.indiana.edu:80/~econed, in his generally censorious review of recent educational research "through the lens of theoretical statistics," raised other objections to Hake (1998a):
- The amount of variability around a mean test score for a class of 20 students versus a mean of 200 students cannot be expected to be the same. Estimation of a standard error for sample of 62. . . (courses) . . ., where each of the 62 receives an equal weight ignores this heterogeneity.
But only the SD's of the <<g>>'s for the 48 IE and 14 T courses were given (not the standard errors). For example, I give <<g>>48IE = 0.48 ± 0.14 (SD). The spread (SD) in <g> values for the IE courses is large, 29% of <<g>>48IE. In the error analysis (Hake 1998a), I adduce evidence that the large spread in the <g>IE distribution is due to random errors plus other factors, e.g., course-to-course variations in the systematic errors and in the effectiveness of the pedagogy and/or implementation. In my opinion, attempts to take into account the heterogeneity due to course enrollment would add little to the analysis.
 
The crucial point, seemingly ignored by Becker, is that the difference _(<<g>>48IE -<<g>>14T) is large in comparison to the spread in the data; more technically, the "effect size," d = 2.43 [see above, Eq. (9)] is relatively large. Hence, it is extremely unlikely that details of the <g> averaging procedure would affect the conclusion that IE courses can be much more effective than T courses in enhancing conceptual understanding of Newtonian mechanics. This view is borne out by the fact that the course-enrollment N-weighted<<g>>48IE-Nweighted = 0.49 and <<g>>14T-Nweighted = 0.24 are very close to the non-N weighted averages <<g>>48IE = 0.48, <<g>>14T = 0.23.
 
 
- Unfortunately, the gap closing outcome measure "g" is algebraically related to the starting position of the student as reflected in the pretest: g falls as the pretest score rises, for maximum score > posttest score > pretest score. Any attempt to regress a posttest minus pretest change score, or its standardized gap closing measure g, on a pretest score yields a biased estimate of the pretest effect. Hake (1998) makes no reference to this bias when he discusses his regressions and correlation of average normalized gain, average gain score and posttest score on the average pretest score. These regressions suffer from the classic errors in variables problem and regression to the mean problem associated with the use of the pretest as an explanatory index variable for unknown ability. Even those schooled in statistics continue to overlook this regression fallacy, as called to economists' attention . . . (by) . . . Nobel laureate Milton Friedman (1992).
The first sentence is based on Becker's partial differentiation  g/ g/ y = (x-Q) / (Q-y) 2 in agreement with Hake  (1998a, footnote 45). Here Q is the number of questions on the exam, y and x are the number of correct responses on the pretest and posttest, respectively, and g = (x - y)/(Q - y). But Becker's differentiation, while mathematically correct, has little physical significance because, for a single student, x is not, in general, independent of y; and for a single class, the average <x> is not independent of the average <y>. In fact, as indicated above, for the 62 courses of the survey, correlation of the average posttest score <x> with the average pretest score <y> is +0.55, while the correlation of average normalized gain <g> with <y> is a very low +0.02. y = (x-Q) / (Q-y) 2 in agreement with Hake  (1998a, footnote 45). Here Q is the number of questions on the exam, y and x are the number of correct responses on the pretest and posttest, respectively, and g = (x - y)/(Q - y). But Becker's differentiation, while mathematically correct, has little physical significance because, for a single student, x is not, in general, independent of y; and for a single class, the average <x> is not independent of the average <y>. In fact, as indicated above, for the 62 courses of the survey, correlation of the average posttest score <x> with the average pretest score <y> is +0.55, while the correlation of average normalized gain <g> with <y> is a very low +0.02.
 
Regarding the rest of Becker's criticism #2 above: (a) Although Becker, and Chizmar and Ostrosky (1998) take the pretest score as a measure of students' "ability/aptitudes" in the course, the MD/FCI pretest scores reflect the students' "initial knowledge states," which, in my experience, have little if any connection with their "ability/aptitudes." (b) In my opinion, "classic errors in variables problem and regression to the mean problem" do not invalidate my calculation of correlation coefficients in Eqs. (2-4), and are not responsible for a significant fraction of the nearly 2-SD difference in the average normalized gains of IE and T courses found in Hake (1998a, 1998b, 1998c).
 
[As an aside, Becker relates g to the "Tobit model" (named after economics Nobel laureate James Tobin) and implies that economist Frank Ghery (1972) was the first to propose use of g, evidently unaware (as was I) that g was earlier used by psychologists Hovland et al. (1949).]
 
 
- When studies ignore the class size . . . (see my counters to this above in Conclusions of the Survey and under Becker's criticism 1. ) . . . and sample selection issues, readers should question the study's findings regardless of the sample size or diversity in explanatory variables:
a. Hake does not give us any indication of beginning versus ending enrollments, which is critical information if one wants to address the consequence of attrition.
 
Becker is probably unaware of the submitted (unpublished—physics education research lacks an archival Physical Review—but available on the web) companion paper (Hake 1998b - called "ref. 17a" in Hake 1998a). The data Table Ia,b,c of Hake 1998b clearly indicates which courses were and were not analyzed with "matched" data, i.e., data in which only posttest scores of students who had also taken the pretest were included in the average posttest score. In a majority of courses, matched data were used. Table Ia,b,c shows no obvious dependence of <g> on whether or not the data were matched. In footnote "c" of that table, I estimate, from my experience with the pre/post testing of 1263 students at Indiana University, "that the error in the normalized gain is probably less than 5% for classes with 20–50 students and less than 2% for classes with more than 50 students." Saul (1998, p. 117) states ". . . . I found that the matched and unmatched results . . . from his extensive pre/post FCI studies . . . . are not significantly different." Consistent with the view that the use of some unmatched data had little influence on the survey results, an analysis of all the matched data of the survey (four T courses enrolling 292 students, and 34 IE courses enrolling 3511 students) yields:
 
 
| <<g>>4T-matched = 0.24 ± 0.05sd, <<g>> 34IE-matched = 0.50 ± 0.13sd; | (11) |  
 
 
| <<g>>4T-matched&Nweighted = 0.23, <<g>> 34IE-matched&Nweighted = 0.51 | (12) |  
 
The matched data <<g>>'s of Eqs. (11 & 12) are very close to those for the complete data set of 14 T courses enrolling 2084 students and 48 IE courses enrolling 4458 students:
 
<<g>>14T = 0.23 ± 0.04SD, <<g>>48IE = 0.48 ± 0.14SD . . . . . (5,6)
 
b. . . . if test administration is voluntary, teachers who observe that their average class score is low on the pretest may not administer the posttest. This is a problem for multi-institution studies, such as that described in Hake (1998) where instructors elected to participate, administer tests and transmit data.
 
Becker may have overlooked Hake's (1998a, Sec. II, Survey Method and Objective) statement: "This mode of data solicitation . . . (voluntary submission of data by teachers). . . tends to pre-select results which are biased in favor of outstanding courses which show relatively high gains on the FCI . . . As in any scientific investigation, bias in the detector can be put to good advantage if appropriate research objectives are established. We do not attempt to access the average effectiveness of introductory mechanics courses. Instead we seek to answer a question of considerable practical interest to physics teachers: CAN the classroom use of IE methods increase the effectiveness of introductory mechanics courses well beyond that attained by traditional methods?"
 
c. . . . Hake (1998a) . . . extol(s) the power in testing associated with large national samples . . . (but fails to). . . . fully appreciate or attempt to adjust for the many sample selection problems in generating pre- and posttest scores.
 
I have addressed Becker's purported "sample selection problems" above. In my opinion, Becker fails to appreciate the fact that use of the normalized gain <g> obviates the need to adjust for the pretest score. Furthermore, the observed nearly 2-SD difference in the <<g>>'s of IE and T courses appears to overwhelm the smaller effects of "hidden variables" as discussed in Are There Important "Hidden Variables" (Peat 1997)? below.
 
 
- . . . . there is relatively strong inferential evidence . . . [evidently from Almer et al. (1998) and Chizmar and Ostrosky (1998)] . . . supporting the hypothesis that periodic use of variants of the one-minute paper (wherein an instructor stops class and asks each student to write down what he or she thought was the key point and what still needed clarification at the end of a class period) increases student learning. SIMILAR SUPPORT COULD NOT BE FOUND FOR OTHER METHODS. This does not say, however, that alternative teaching techniques do not work. It simply says that THERE IS NO COMPELLING STATISTICAL EVIDENCE SAYING THAT THEY DO. (My EMPHASIS.)
Becker omits mention of the difficulties (Snedecor and Cochran 1989, Chapter 17, "Multiple Linear Regression") in standard regression analyses (such as those of Chizmar and Ostrosky (1998) and Almer et al. (1998)) with one dependent variable Y and more than one independent variable "X", e.g., possible (1) intercorrelation of hypothesized X's, (2) non-linear relationships of X's with Y, (3) omission of important X's, and (4) measurement errors in X's.
 
In any case, Becker evidently thinks that there is no compelling statistical evidence that
 
- (a)
- IE courses can be much more effective than T courses in enhancing conceptual understanding of Newtonian mechanics, and
- (b)
- alternative teaching techniques (save minute papers) "work." If economics faculty share Becker's belief "b", then it is little wonder that their teaching methods are "still dominated by chalk and talk classroom presentations" (Becker and Watts 2000, 2001). And given that minute papers are the only alternative technique shown to "work," why should economics instructors accept the Becker/Watts claim (evidently regarded as unsubstantiated by Becker himself) that "their ratings. . . (will). . . go up, along with what students learn, when students are actively involved in the classroom learning experience"? (My italics.)
 
Becker's criteria for "compelling statistical evidence" are contained in his "11-point set of criteria that all inferential studies can be expected to address in varying degrees of detail." Becker writes: "Outside of economics education I could find no examples of education researchers checking alternative regression model specifications . . . (as). . . can be seen in Chizmar and Ostrosky (1998) . . .(and) . . . Becker and Powers (2001)." Thus it would appear that all inferential studies by non-economists fail to fully satisfy point #6 of Becker's criteria: "Multivariate analyses, which includes diverse controls for things other than exposure to the treatment that may influence outcomes (e.g., instructor differences, student aptitude . . . [called "hidden variables" below]), but that cannot be dismissed by randomization (which typically is not possible in education settings)."
 
But if the effects of such variables are small in comparison to that induced by the treatment, and the treatment is given to many heterogeneous control and test groups as in Hake (1998a, 1998b, 1998c), then, in my opinion, multivariant analyses, with all their uncertainties, are of dubious value in demonstrating the efficacy of the treatment as argued in Are There Important "Hidden Variables" (Peat 1997)? below.
 
Furthermore, Becker, in his econocentric concentration on inferential statistics, omits criteria for "compelling evidence" that most physical scientists regard as crucial, but which are largely ignored by economics education researchers (EER's): (a) careful consideration of possible systematic errors, (b) the presentation of raw unmassaged data such that experiments can be repeated and checked by other investigators, and (c) the related extent to which the research conclusions are independently verified by other investigators under other circumstances so as to contribute to a community map (see Can educational research be scientific research? below).
 
As an aside, my own experience with minute papers (Hake 1998a, ref. 40; Hake 1998b, ref. 58 and Table IIc) is that they can constitute a significant but relatively minor segment of effective interactive engagement, consistent with the results of Chizmar and Ostrosky (1998), who find a rather miniscule minute-paper effect: an approximately 7% increase in the posttest TUCE (Saunders 1991) score relative to the pretest TUCE score (one-tailed p = 0.025). (Typical of EER's, Chizmar and Ostrosky (1998) do not specify an effect size.) Becker continues the usual literature misattribution of minute papers to Wilson (1986) (and indirectly to CAT champions Angelo and Cross [1993]), rather than to Berkeley physicist Charles Schwartz (1983); see also Hake (2001c).
 
 
- In a later communication, Becker (2001b) wrote your study does not adequately address the sample selection issues associated with the way in which subjects entered your study . .  (see my response in "3b" above) . . ., the attrition from the pretest to posttest . . . (see my response in "3a" above) . . . and the implications for class averages. Jim Heckman . . . .(Heckman et al. 1998, Heckman 2000, Heckman et al. 2001). . . earned a Nobel Prize in economics for his work on selection issues and it is unfortunate that you appear unaware of it and the subsequent developments in econometrics and statistics. There are many applications of Heckman's work in education research including my own recent contribution . . . (Becker and Powers 2001).
I have reviewed Heckman (2000) and Heckman et al. (1998), and agree with Becker that Heckman's work is potentially relevant to education research. However, the study by Heckman et al. (1998) concerns the danger that members of a control group may fare differently than the members of a test group had they (the control group members) been in the test group. I fail to understand how this aspect of Heckman's work, or any other aspects cited by the Nobel committee in Heckman (2000), are relevant to the control (T) and test (IE) group participants in Hake (1998a, 1998b, 1998c), as the participants in the latter study were all drawn from the same generic introductory physics courses, and I have addressed above the various other selection issues with which Heckman deals. 
ARE THERE IMPORTANT
"HIDDEN VARIABLES" (PEAT 1997)?
As indicated in Conclusions of the Survey, Eq. (2), for the survey of Hake (1998a, 1998b, 1998c) the correlation of the normalized gain <g> with %<pretest> is a very low +0.02. However, open research questions remain as to whether or not (a) any "hidden variables" (HV's) (the averages over a class of, e.g., math proficiency, spatial visualization ability, scientific reasoning skills, physics aptitude, gender, personality type, motivation, socio-economic level, ethnicity, IQ, SAT, GPA) are significantly correlated with <g>, and (b) the extent to which any such correlations represent causation (Cook and Campbell 1979, Light et al. 1990, Slavin 1992).
For one course (IU94S of Hake 1998b, Table Ic, N = 166, <g> = 0.65), Hake et al. (1994) found significant average mathematics pretest-score differences between high- and low-normalized-gain students. For a later course (IU95S of Hake 1998b, Table Ic, N = 209, <g> = 0.60), correlation coefficients between single student g's and pretest scores on mathematics and spatial visualization of, respectively, +0.32 and +0.23 were measured (Hake 1995). More recently, a student-enrollment weighted correlation coefficient of +0.32 for four courses with total enrollment N = 219 between single student g's and math-skills pretest scores was reported (Meltzer 2001).
Preliminary work by Clement (2001) suggests a positive correlation of single student g's with a pretest (Lawson 1995) of scientific reasoning. The measurements of Halloun and Hestenes (1998) and Halloun (1997) suggest the existence of a positive correlation between single student g's and pretest scores on their Views About Sciences Survey (VASS). Hake (1995), Henderson et al. (1999), McCullough (2000), Galileo Project (2001), and Meltzer (2001) have reported gender differences (<g>males > <g>females) in <g>'s for some classes. (Hake calculated a gender-difference effect size 0.58 for IU95S [see above]. Meltzer calculated gender-difference effect sizes of 0.44 and 0.59 for two classes [N = 59, 78] at Iowa State University, but observed no significant gender difference in two other classes [N = 45, 37] at Southeastern Louisiana University.)
Nevertheless, the <g> dependence on the above HV's is small relative to the very strong dependence of <g> on the degree of interactive engagement (effect size 2.43, Eq. 9), and would tend to average out in multi-course comparisons over widely diverse populations in both the test and control groups. Thus, I think that it is extremely unlikely that HV effects could account for an appreciable fraction of the nearly 2 SD difference in the average of <g>'s for the 48 IE and 14 T courses. However, such effects could, of course, be significant in the comparison of only a few non-randomly assigned courses, as emphasized by Meltzer (2001).
CAN EDUCATIONAL RESEARCH
BE SCIENTIFIC RESEARCH?
There has been a long-standing debate over whether education research can or should be "scientific" (e.g., pro: Dewey 1929, 1966, Anderson et al. 1998, Bunge 2000, Redish 1999, Mayer 2000, 2001, Phillips and Burbules 2000, Phillips 2000; con: Lincoln and Guba 1985, Schön 1995, Eisner 1997, Lagemann 2000). In my opinion, substantive education research must be "scientific" in the sense indicated below. My biased prediction (Hake 2000a) is that, for physics education research, and possibly even education research generally: (a) the bloody "paradigm wars" (Gage 1989) of education research will have ceased by the year 2009, with, in Gage's words, a "productive rapprochement of the paradigms," (b) some will follow paths of pragmatism or Popper's "piecemeal social engineering" to this paradigm peace, as suggested by Gage, but (c) most will enter onto this "sunlit plain" from the path marked "scientific method" as practiced by most research scientists:
- "EMPIRICAL: Systematic investigation . . . (by quantitative, qualitative, or any other means) . . . of nature to find reproducible patterns in the structure of things and the ways they change (processes).
 
- THEORETICAL: Construction and analysis of models representing patterns of nature." (Hestenes 1999).
 
- "Continual interaction, exchange, evaluation, and criticism so as to build a . . . . community map." (Redish 1999).
For the presently discussed research, the latter feature is demonstrated by the fact that FCI normalized gain results for IE and T courses that are consistent with those of Hake (1998a, 1998b, 1998c) have now been obtained by physics education research (PER) groups at the Univ. of Maryland (Redish et al. 1997, Saul 1998, Redish and Steinberg 1999, Redish 1999), the University of Montana (Francis et al. 1998), Rennselaer and Tufts Universities (Cummings et al. 1999), North Carolina State University (Beichner et al. 1999), Hogskolan Dalarna - Sweden (Bernhard 2001), Carnegie Mellon University (Johnson 2001), and City College of New York (Steinberg and Donnelly 2002). In addition, PER groups have now gone beyond the original survey in showing, for example, that (a) there may be significant differences in the effectiveness of various IE methods (Saul 1998, Redish 1999); and (b) FCI data can be analyzed so as to show the distribution of incorrect answers in a class and thus indicate common incorrect student models (Bao and Redish 2001). Thus in physics education research, just as in traditional physics research, it is possible to perform quantitative experiments that can be reproduced (or refuted) and extended by other investigators, and thus contribute to the construction of a continually more refined and extensive "community map."
FOURTEEN LESSONS FROM
THE PHYSICS EDUCATION REFORM EFFORT
The lessons below are derived from my own interpretation of the physics education reform movement and are, therefore, somewhat subjective and incomplete. They are meant to stimulate discussion rather than present any definitive final analysis. The first six lessons deal with interactive engagement, the final eight with implementation.
Six lessons on interactive engagement
Lesson 1: The use of IE strategies can increase the effectiveness of conceptually difficult courses well beyond that obtained by traditional methods.
Education research in biology (Hake 1999b, 1999c), chemistry (Herron and Nurrenbern 1999), and engineering (Felder et al. 2000a, 2000b), and introductory science education generally (Stockstad 2001), although neither as extensive nor as systematic as that in physics (McDermott and Redish 1999, Redish 1999), is consistent with the latter in suggesting that, in conceptually difficult areas, IE methods are more effective than T (passive-student) methods in enhancing students' understanding. Furthermore, there is some preliminary evidence that learning in IE physics courses is substantially retained 1 to 3 years after the courses have ended (Chabay 1997, Francis et al. 1998, Bernhard 2001). I see no reason to doubt that enhanced understanding and retention would result from greater use of IE methods in other science, and even non-science, areas, but substantive research on this issue is sorely needed (see Lessons 3 and 4 below).
Lesson 2: The use of IE and/or high-tech methods, by themselves, does not ensure superior student learning.
As previously indicated, the data in Fig. 1 show that seven of the IE courses (717 students) achieved <g>'s close to those of the T courses. Five of those made extensive use of high-tech microcomputer-based labs (Thornton and Sokoloff 1990, 1998). Case histories of the seven low-<g> courses (Hake 1998b) suggest that implementation problems occurred.
Another example of the apparent failure of IE/high-tech methods has been described by Cummings et al. (1999). They considered a standard physics studio course at Rensselaer in which group work and computer use had been introduced as components of in-class instruction. The classrooms appeared to be interactive and students seemed to be engaged in their own learning. Their measurement of <g>'s using the FCI and the Force Motion Concept Evaluation (Thornton and Sokoloff 1998) yielded values close to those characteristic of T courses (Hake 1998a, 1998b, 1998c). Cummings et al. (1999) suggest that the low <g> of the standard Rensselaer studio course may have been due to the fact that "the activities used in the studio classroom are predominately "traditional" activities adapted to fit the studio environment and incorporate the use of computers." Thus, the apparent "interactivity" was a product of T methods (supported by high technology), not published IE methods developed by physics education researchers and based on the insights of cognitive scientists and/or outstanding classroom teachers, as for the survey courses. This explanation is consistent with the fact that Cummings et al. (1999) measured <g>'s in the 0.35–0.45 range for revised Rensselaer studio courses using physics education research methods: (a) Interactive Lecture Demonstrations (Thornton and Sokoloff 1998), and (b) Cooperative Group Problem Solving (Heller and Hollabaugh 1992, Heller et al. 1992).
It should be emphasized that, although high technology is, by itself, no panacea, it can be very advantageous when it promotes interactive engagement, as in, e.g.:
- (a)
- computerized classroom communication systems (see, e.g., Dufresne et al. 1996, Mazur 1997, Abrahamson 1999, Burnstein and Lederman 2001, Better Education Inc. 2001);
- (b)
- properly implemented microcomputer-based labs (Thornton and Sokoloff 1990);
- (c)
- interactive computer animations for use after hands- and minds-on experiments and Socratic dialogue (Hake 2001a);
- (d)
- computer-implemented tutorials (Reif and Scott 1999);
- (e)
- Just-in-time teaching (Novak and Patterson 1998, Novak et al. 1999, Gavrin 2001).
Lesson 3: High-quality standardized tests of the cognitive and affective impact of courses are essential to gauge the relative effectiveness of non-traditional educational methods.
As indicated in the Introduction, so great is the inertia of the educational establishment (see Lesson 13) that three decades of physics education research demonstrating the futility of the passive-student lecture in introductory courses was ignored until high-quality standardized tests, that could be easily administered to thousands of students, became available. These tests are yielding increasingly convincing evidence that IE methods enhance conceptual understanding and problem-solving abilities far more than do T methods. Such tests may also indicate implementation problems in IE courses (Hake 1998b). As far as I know, disciplines other than physics, astronomy (Adams et al. 2000, Zeilik et al. 1997, 1998, 1999), and possibly economics (Saunders 1991, Kennedy and Siegfried 1997, Chizmar and Ostrosky 1998, Allgood and Walstad 1999) have yet to develop any such tests and, therefore, cannot effectively gauge either the need for or the efficacy of their reform efforts. In my opinion, all disciplines should consider constructing high-quality standardized tests of essential introductory course concepts.
The lengthy and arduous process of constructing valid and reliable multiple choice tests has been discussed by Aubrecht (1991), Halloun and Hestenes (1985a), Hestenes et al. (1992), Beichner (1994), McKeachie (1999), and Maloney et al. (2001). In my opinion, such hard-won Diagnostic Tests that cover important parts of common introductory courses are national assets whose confidentiality should be as well protected as the MCAT (Medical College Admission Test). Otherwise the test questions may migrate to student files and thereby undermine education research that relies upon the validity of such tests. Suggestions for both administering Diagnostic Tests and reporting their results so as to preserve confidentiality and enhance assessment value have been given by Hake (2001b).
Regarding tests of affective impact:
- (a)
- administration of the Maryland Physics EXpectations (MPEX) survey to 1500 students in introductory calculus-based physics courses in six colleges and universities showed "a large gap between the expectations of experts and novices and . . . . a tendency for student expectations to deteriorate rather than improve as a result of introductory calculus-based physics" (Redish et al. 1998). Here the term "expectations" is used to mean a combination of students' epistemological beliefs about learning and understanding physics and students' expectations about their physics course (Elby 1999). Elby (2001) has recently conducted classes designed to help students develop more sophisticated beliefs about knowledge and learning as measured by MPEX.
- (b)
- The Arizona State University "Views About Sciences Survey" (VASS) (Halloun and Hestenes 1998, Halloun 1997) (available for physics, chemistry, biology, and mathematics at http://modeling.la.asu.edu/R&E/Research.html) indicates that students have views about physics that (i) often diverge from physicists' views; (ii) can be grouped into four distinct profiles: expert, high transitional, low transitional, and folk; (iii) are similar in college and high school; and (iv) correlate significantly with normalized gain g on the FCI. It may well be that students' attitudes and understanding of science and education are irreversibly imprinted in the early years (but see Elby [2001]). If so, corrective measures await a badly needed shift of K–12 education away from rote memorization and drill (often encouraged by state-mandated standardized tests) to the enhancement of understanding and critical thinking (Hake 2000b, 2000c, Mahajan and Hake 2000, Benezet 1935/36) (see Lesson 10).
Lesson 4: Education Research and Development (R&D) by disciplinary experts (DE's), and of the same quality and nature as traditional science/engineering R&D, is needed to develop potentially effective educational methods within each discipline. But the DE's should take advantage of the insights of (a) DE's doing education R&D in other disciplines, (b) cognitive scientists, (c) faculty and graduates of education schools, and (d) classroom teachers.
Redish (1999) has marshaled the arguments for the involvement of physicists in physics departments, not just faculty of education schools, in physics education research. Similar arguments may apply to other disciplines. For physics, Redish gave these arguments:
- (a)
- physicists have good access to physics courses and students on which to test new curricula;
- (b)
- physicists and their departments directly benefit from physics education research;
- (c)
- education schools have limited funds for disciplinary education research; and
- (d)
- understanding what's going on in physics classes requires deep rethinking of physics and the cognitive psychology of understanding physics.
One might add that the researchers themselves must be excellent physics teachers with both content and "pedagogical content" knowledge (see Lesson 7) of a depth unlikely to be found among non-physicists.
The education of disciplinary experts in education research requires PhD programs at least as rigorous as those for experts in traditional research. The programs should include, in addition to the standard disciplinary graduate courses, some exposure to: the history and philosophy of education, computer science, statistics, political science, social science, economics, engineering (see Lesson 11), and, most importantly, cognitive science (i.e., philosophy, psychology, artificial intelligence, linguistics, anthropology, and neuroscience). The breadth of knowledge required for effective education research is similar to that required in ecological research (Holling 1997). According to the Physical Science Resource Center (2001) and the UMd-PERG (2001b), there are now about a dozen PhD programs in physics education within physics departments and about half that number of interdisciplinary programs between physics and education or cognitive psychology in the United States. In my opinion, all scientific disciplines should consider offering PhD programs in education research.
But how can disciplinary education researchers and, for that matter, university faculty generally, take advantage of the insights of: disciplinary experts doing education R&D in other disciplines, cognitive scientists, faculty and graduates of education schools, and classroom teachers? (The current education–research schism between economics and physics is demonstrated above in Criticisms of the Survey) In my opinion, despite the rigid departmental separation of disciplines in most research universities, the web has the potential to enhance cooperation and interchange among these groups dramatically (Hake, 1999c, 2000d). Certainly the success of Conservation Ecology http://www.consecol.org/Journal/ testifies to the value of the web in promoting interdisciplinary understanding and effort. A starting point might be the construction of web guides for various disciplines similar to REDCUBE http://www.physics.indiana.edu/~redcube (Hake 1999b), which provides a point of entry into the vast literature and web resources relevant to REsearch, Development, and Change in Undergraduate Biology Education.The 9/8/99 version contains 47 biology-educator profiles, 446 references (including 124 relevant to general science education reform), and 490 hot-linked URLs on (a) biology associations, (b) biology teachers' websites, (c) scientific societies and projects (not confined to biology), (d) higher education, (e) cognitive science and psychology, (f) U. S. government, and (g) searches and directories.
Lesson 5: The development of effective educational methods within each discipline requires a redesign process of continuous long-term classroom use, feedback, assessment, research analysis, and revision.
Wilson and Daviss (1994) suggest that the "redesign process," used so successfully to advance technology in aviation, railroads, automobiles, and computers can be adapted to K–12 education reform through "System Redesign Schools." Redesign processes in the reform of introductory undergraduate physics education have been undertaken and described by McDermott (1991) and by Hake (1998a). In my opinion, "redesign" at both the K–12 and undergraduate levels can be greatly assisted by the promising "Scholarship of Teaching & Learning" movement (Carnegie Academy 2000) inspired by Boyer (1990) and the Boyer Commission (1998).
Lesson 6: Although non-traditional IE methods appear to be much more effective than T methods, there is need for more research to develop better strategies for enhancing student learning.
On a test as elemental as the FCI, it would seem that reasonably effective courses should yield <g>'s above 0.8, but thus far none much above 0.7 have, to my knowledge, been reported. This and the poor showing on the pre/post MPEX test of student understanding of the nature of science and education (Redish et al. 1998) indicate that more work needs to be done to improve IE methods. It would seem that understanding of science might be improved by:
- (a)
- students' apprenticeship research experiences (Collins et al. 1989, Brown et al. 1989);
- (b)
- epistemolgically oriented teachers, materials, and class activities (Elby 2001); and
- (c)
- enrollment in courses featuring interactive engagement among students and
disciplinary experts from different fields, all in the same classroom at the same time (Benbasat and Gass 2001).
In my opinion, more support should be given by universities, foundations, and governments to the development of a science of education spearheaded by disciplinary education researchers working in concert with cognitive scientists and education specialists. In the words of cognitive psychologists Anderson et al. (1998),
The time has come to abandon philosophies of education and turn to A SCIENCE OF EDUCATION . . . . . . If progress is to be made to a more scientific approach, traditional philosophies . . . .(such as radical constructivism) . . . . will be found to be like the doctrines of folk medicine. They contain some elements of truth and some elements of misinformation . . . . . . ONLY WHEN A SCIENCE OF EDUCATION DEVELOPS THAT SORTS TRUTH FROM FANCY—AS IT IS BEGINNING TO DEVELOP NOW—WILL DRAMATIC IMPROVEMENTS IN EDUCATIONAL PRACTICE BE SEEN. (My EMPHASIS.)
J. J. Duderstadt (2001), President Emeritus of the University of Michigan and chair of the National Academies Committee on Science, Education, and Public Policy (COSEPUP) http://www4.nationalacademies.org/pd/cosepup.nsf, cogently argues that
the development of human capital is becoming a dominant national priority in the age of knowledge, comparable in importance to military security and health care. Yet our federal investment in the knowledge base necessary to address this need in miniscule. In FY01. . . (Fiscal Year 2001). . . the nation will invest over $247 billion in R&D . . . . HOW MUCH WILL THE FEDERAL GOVERNMENT INVEST IN RESEARCH DIRECTED TOWARD LEARNING, EDUCATION, AND SCHOOLS? LESS THAN $300 MILLION . . . . most industries spend between 3% to 10% per year of revenues for R&D activities. By this measure, the education sector of our economy (including K–12, higher education, and workforce training), which amounts to $665 billion, should be investing $20 billion or greater each year in R&D, roughly the same order of magnitude as the health care sector. . . . .(Focusing on). . . what many term the "SCIENCE OF EDUCATION," meaning RESEARCH THAT WOULD BE CLASSIFIED BY SCIENTISTS AS GUIDED BY THE SCIENTIFIC METHOD AND SUBJECT TO RIGOROUS REVIEW BY THE SCIENTIFIC COMMUNITY. . . (See Can Educational Research Be Scientific Research? above). . . an interesting model for the conduct of research on education and learning IS PROVIDED BY THE DOD'S DEFENSE ADVANCED RESEARCH PROGRAMS AGENCY (DARPA). THROUGH A PROCESS USING VISIONARY PROGRAM MANAGERS TO CHANNEL SIGNIFICANT, FLEXIBLE, AND LONG-TERM FUNDING TO THE VERY BEST RESEARCHERS FOR BOTH BASIC AND APPLIED RESEARCH UNDERGIRDING KEY DEFENSE TECHNOLOGIES, DARPA HAS BEEN ABLE TO CAPTURE CONTRIBUTIONS OF THE VERY BEST OF THE NATION'S SCIENTISTS AND ENGINEERS IN HIGHLY INNOVATIVE PROJECTS. . . . . PERHAPS WE NEED AN EDUCATION ADVANCED RESEARCH PROGRAMS AGENCY . . .(EARPA). . . . TO FOCUS THE CAPABILITIES OF THE AMERICAN RESEARCH ENTERPRISE ON WHAT MANY BELIEVE TO BE OUR NATION'S MOST COMPELLING PRIORITY, THE QUALITY OF EDUCATION FOR A KNOWLEDGE-DRIVEN SOCIETY. . . . If the past 50 years of science policy can be characterized as a transition in national priorities "from guns to pills," let me suggest that THE NEXT 50 YEARS WILL SEE THE TRANSITION "FROM PILLS TO BRAINS." IT IS TIME THAT WE REALIZED THAT OUR NATION'S INTELLECTUAL CAPITAL, THE EDUCATION OF OUR PEOPLE, THE SUPPORT OF THEIR IDEAS, THEIR CREATIVITY, AND THEIR INNOVATION, WILL BECOME THE DOMINANT PRIORITY OF A KNOWLEDGE-DRIVEN NATION . . .  (My EMPHASIS.)
However, it should be emphasized that the development of better strategies for the enhancement of student learning will not improve the educational system unless (a) university and K–12 teachers (see Lesson 10) are educated to effectively implement those strategies, and (b) research universities start to think of education in terms of student learning rather than the delivery of instruction (see Lesson 12h). In Duderstadt's (2001) words,
Beyond new mechanisms to stimulate and support research in the science of education, WE ALSO NEED TO DEVELOP MORE EFFECTIVE MECHANISMS TO TRANSFER WHAT WE HAVE LEARNED INTO SCHOOLS, COLLEGES, AND UNIVERSITIES. For example, the progress made in cognitive psychology and neuroscience during the past decade in the understanding of learning is considerable. Yet almost none of this research has impacted our schools. As one of my colleagues once said, "If doctors used research like teachers do, they would still be treating patients with leeches." (My EMPHASIS.)
Eight lessons on implementation
Lesson 7: Teachers who possess both content knowledge and "pedagogical content knowledge" are more apt to deliver effective instruction.
"Pedagogical content knowledge" is evidently a term coined by Shulman (1986, 1987), but its importance has long been known to effective classroom teachers. The difference between content knowledge and "pedagogical content knowledge," can be illustrated by consideration of the HH-type question given in the Introduction. Content knowledge informs the teacher that, according to Newton's First Law, while the brick is moving vertically upward at a constant speed in the inertial reference frame of the lab, the magnitude of the force on the brick by the student's hand is constant in time and of magnitude W, so that the net force on the brick is zero. On the other hand, pedagogical content knowledge would inform the teacher that students may think that, for example, (a) because a net force is required to produce motion, the force on the brick by the student's hand is constant in time and greater than W, or (b) because the weight of the brick diminishes as it moves upward away from the Earth, the force on the brick by the student's hand decreases in time but is always greater than W, or (c) no force is exerted on the brick by the student's hand because as the student's hand moves up, the brick must simply move up to stay out of the hand's way. In addition, pedagogical content knowledge provides a hard-won toolkit of strategies (see, for example, the list of "Popular IE Methods" in the section of the same name above) for guiding the student away from these misconceptions and towards the Newtonian interpretation. Unfortunately, such knowledge may take many years to acquire (Wells et al. 1995).
Lesson 8: College and university faculty tend to overestimate the effectiveness of their own instructional efforts and thus tend to see little need for educational reform.
As examples of this tendency see Geilker (1997) (countered by Hilborn [1998]), Griffiths (1997) (countered by Hestenes [1998]), Goldman (1998), Mottman (1999a, 1999b) (countered by Kolitch [1999], Steinberg [1999], and Hilborn [1999]), and Carr (2000).
Lesson 9: Such complacency can sometimes be countered by administering high-quality standardized tests of understanding and by "video snooping."
- (a)
- Harvard's Eric Mazur (1997) was very satisfied with his introductory course teaching. He received very positive student evaluations and his students did reasonably well on "difficult" exam problems. Thus, it came as a shock when his students hardly fared any better on the "simple" FCI than on their "difficult" midterm exam. As a result, Mazur developed and implemented his IE Peer Instruction method as a replacement for his previous T passive-student lectures. This change resulted in much higher <g>'s on the FCI, as shown by comparison of the red and green triangular points with average pretest scores in the vicinity of 70% in Fig. 1.
- (b)
- Like Mazur, most Harvard faculty members are proud of their undergraduate science courses. However, the videotape Private Universe (Schneps and Sadler 1985) shows Harvard graduating seniors being asked "What causes the seasons?" Most of them confidently explain that the seasons are caused by yearly changes in the distance between the Sun and the Earth! Similarly most MIT faculty regard their courses as very effective preparation for the difficult engineering problems that will confront their elite graduates in professional life. However, the videotape Simple Minds (Annenberg/CPB 2002) shows MIT graduating seniors having great trouble getting a flashlight bulb to light, given one bulb, one battery, and one piece of wire.
Lesson 10: A major problem for undergraduate education in the United States is the inadequate preparation of incoming students, in part due to the inadequate university education of K–12 teachers.
According to the National Research Council (1999), the Third International Mathematics and Sciences Survey (TIMSS) indicates that,
U. S. students' worst showing was in population 3 . . . . (final year of secondary school. . . . corresponding to U. S. high school seniors). . . . In the assessment of general mathematics and science knowledge, U. S. high school seniors scored near the bottom of the participating nations. In the assessments of advanced mathematics and physics given to a subset of students who had studied those topics, no nations had significantly lower mean scores than the United States. The TIMSS results indicate that a considerably smaller percentage of U. S. students meet high performance standards than do students in other countries.
Consistent with the foregoing, I have observed (Hake 2000c) that FCI pretest averages for students entering the introductory physics course at Indiana University are quite low (30–45%) and about the same whether or not the students are graduates of high school physics classes.
It is not just a matter of physics floundering. According to Epstein (1997-98),
 
While it is now well known that large numbers of students arrive at college with large educational and cognitive deficits many faculty and administrative colleagues are not aware that many students lost all sense of meaning or understanding in elementary school . . . . In large numbers our students . . . . (at Bloomfield College, New Jersey and Lehman, SUNY) . . . . cannot order a set of fractions and decimals and cannot place them on a number line. Many do not comprehend division by a fraction and have no concrete comprehension of the process of division itself. Reading rulers where there are other than 10 subdivisions, basic operational meaning of area and volume, are pervasive difficulties. Most cannot deal with proportional reasoning nor any sort of problem that has to be translated from English. Our diagnostic test, which has now been given at more than a dozen institutions shows that there are such students everywhere . . . . . (even Wellesley [J. Epstein, 1999, unpublished manuscript]).
Kati Haycock (1999), Director of the American Association of Higher Education's (AAHE's) Education Trust http://www.edtrust.org/ hits the nail on the head,
Higher education . . . (unlike Governors and CEO's) . . . . has been left out of the loop and off the hook .... (in the effort to improve America's public schools since the release of A Nation at Risk in 1983). . . . Present neither at the policy tables where improvement strategies are formulated nor on the ground where they are being put into place, most college and university leaders remain blithely ignorant of the roles their institutions play in helping K–12 schools get better—and the roles they currently play in maintaining the status quo . . . . HOW ARE WE GOING TO GET OUR STUDENTS TO MEET HIGH STANDARDS IF HIGHER EDUCATION CONTINUES TO PRODUCE TEACHERS WHO DON'T EVEN MEET THOSE SAME STANDARDS? How are we going to get our high school students to work hard to meet new, higher standards if most colleges and universities will continue to admit them regardless of whether or not they even crack a book in high school? (My EMPHASIS.)
According to the NSF Advisory Committee (1996),
Many faculty in SME&T. . . . (Science, Math, Engineering, and Technology) . . . . at the post-secondary level continue to blame the schools for sending underprepared students to them. But, increasingly, the higher education community has come to recognize the fact that teachers and principals in the K–12 system are all people who have been educated at the undergraduate level, mostly in situations in which SME&T PROGRAMS HAVE NOT TAKEN SERIOUSLY ENOUGH THEIR VITAL PART OF THE RESPONSIBILITY FOR THE QUALITY OF AMERICA'S TEACHERS. (My EMPHASIS.)
See also NSF Advisory Committee (1998).
Recently, some corrective steps have been taken in undergraduate physics education. According to Stein and Hehn (2000), the Physics Teacher Education Coalition (PhysTEC) is an American Association of Physics Teachers/NSF project "....to increase the role of physics departments, in collaboration with education departments to create more and better-prepared future teachers. Over the next five years....(PhysTEC)....will be established with an initial membership of more than 20 universities and colleges that share an increasing interest in revising their teacher preparation program." (My italics.)
Fortunately, despite the general failure of pre-service teacher education, several programs have been established over the past few years to enhance the pedagogical skills and content knowledge of in-service physics teachers. For a hot-linked list of 25 such programs see Hake (2000c).
The Glenn Commission (2000) proposals may be a step in the right direction. The commission requests 5 billion dollars in the first year  to initiate,
- (a)
- establishment of an ongoing system to improve the quality of mathematics and science teaching in grades K–12;
- (b)
- a significant increase in the number of mathematics and science teachers with improved quality in their preparation; and
- (c)
- improvement in the working environment, in order to make the teaching profession more attractive for K–12 mathematics and science teachers. (My italics.)
More recently, the U. S. Commission on National Security/21st Century (Hart - Rudman Commission) (2001a) has warned that ". . . the U. S. need for the highest quality human capital in science, mathematics, and engineering is not being met. . . (partially because) . . . . the American . . . (K–12) . . . education system is not performing as well as it should," and recommends a "National Security Science and Technology Education Act to fund a comprehensive program to produce the needed numbers of science and engineering professionals as well as qualified teachers in science and math."
Lesson 11: Interdisciplinary cooperation of instructors, departments, institutions, and professional organizations is required for synthesis, integration, and change in the entire chaotic educational system.
Although more research to develop better strategies for the enhancement of student learning (Lesson 6) is required, that by itself will not reform the entire chaotic educational system, as has been emphasized by Tobias (1992a, 1992b, 2000), Sarason (1990, 1996), Hilborn (1997), and Wilson and Daviss (1994). In my opinion, an engineering approach to the improvement of education (Felder 2000a, 2000b) seems to be required. Bordogna (1997) conveys the essence of engineering as "integrating all knowledge for some purpose. . . . The engineer must be able to work across many different disciplines and fields—and make the connections that will lead to deeper insights, more creative solutions, and getting things done. In a poetic sense, paraphrasing the words of Italo Calvino (1988), the engineer must be adept at correlating exactitude with chaos to bring visions into focus."  (My italics.) It would appear that "engineering" as seen by Bordogna is similar to "integrative science" as seen by Holling (1998).
Lesson 12: Various institutional and political factors, including the culture of research universities, slow educational reform. Those listed below pertain to the United States, but similar barriers may exist in other countries.
Among the institutional and political factors listed by Tobias (2000) as thwarting educational reform are (those most associated with the culture of research universities are indicated in italics):
- (a)
- Advanced Placement (AP) courses serve as a filter rather than a pump.
- (b)
- In-class and standardized tests (MCAT, SAT, GRE) drive the curriculum in a traditional direction.
- (c)
- Effectiveness of teaching has little effect on promotion/tenure decisions or on national departmental rankings.
- (d)
- High school science courses are not required for college admission; many colleges require little or no science for graduation.
- (e)
- Clients for the sciences are not cultivated among those who do not wish to obtain PhD's.
- (f)
- Class sizes are too large.
To Tobias's list I would add:
- (g)
- The failure of the K–12 system to incorporate physics—the most basic of the sciences and essential for any proper understanding of biology and chemistry—into all grades for all students (Ford 1989, Swartz 1993, Hammer 1999, Neuschatz 1999, Lederman 1999, Livanis 2000).
In the words of physics Nobelist Leon Lederman,
 
 We have observed that 99 percent of our high schools teach biology in 9th (or 10th) grade, chemistry in 10th or 11th grade, and, for survivors, physics in 11th or 12th grade. This is alphabetically correct, but by any logical scientific or pedagogical criteria, the wrong order. . . . This reform . . . .("physics first"). . . . concentrates on installing a coherent, integrated science curriculum, which matches the standards of what high school graduates should understand and be able to do . . . . And wouldn't it be a natural next step to invite the history teachers, the teachers of arts and literature, to help develop those connections of the fields of learning that the biologist E.O. Wilson (1998) calls "consilience"? Arons (1959) took an early step in this direction at Amherst, but his attempts to bridge the "two-culture gap" were abandoned soon after his departure. For some other attempts to link science and liberal arts education see, e.g., Tobias and Hake (1988), Tobias and Abel (1990), and the AAAS (1990) report on the "liberal art of science."
- (h)
- The failure of research universities to:
 - Discharge their obligation to adequately educate prospective K–12 teachers (Hake 2000b, 2000c) (see Lesson 10).
 
- Think of education in terms of student learning rather than the delivery of instruction (Barr and Tagg 1995, Duderstadt 2000, 2001). An emphasis on the learning paradigm may be encouraged by:
 - the previously mentioned Scholarship of Teaching & Learning movement (Carnegie Academy 2000) inspired by Boyer (1990) and the Boyer Commission (1998);
 
- the National Academy for Academic Leadership http://www.thenationalacademy.org/, which strives to "educate academic decision makers to be leaders for sustained, integrated institutional change that significantly improves student learning";
 
- threats from accrediting agencies such as ABET (Accreditation Board for Engineering and Technology (http://www.abet.org/) with its emphasis on accountability for actual student learning (Van Heuvelen and Andre 2000, Heller 2000, Hake 2000b, 2000c); and
 
- competition for transmission-mode lecture services from distance-education conglomerates (Marchese 1998, Duderstadt 2000).
 
 
- Effectively consider crucial multidisciplinary societal problems such as education.
 
In the words of Karl Pister (1996), former Chancellor of the University of California - Santa Cruz,
 
 . . . we need to encourage innovative ways of looking at problems, moving away from the increasing specialization of academia to develop new interdisciplinary fields that can address complex real-world problems from new perspectives. 
- (i)
- The failure of society to pay good K–12 teachers what they are worth. Physicist Don Langenberg (2000), chancellor of the University System of Maryland and president of the National Association of System Heads http://www.nashonline.org/, suggests that
 on average, TEACHERS' SALARIES OUGHT TO BE ABOUT 50% HIGHER THAN THEY ARE NOW. Some teachers, including the very best, those who teach in shortage fields (e.g., math and science) and those who teach in the most challenging environments (e.g., inner cities) ought to have salaries about twice the current norm . . . . Simple arithmetic applied to publicly available data shows that the increased cost would be only 0.6% of the GDP. . . .(i.e., about 600 billion dollars over 10 years). . . . about one twentieth of what we pay for health care. I'D ASSERT THAT IF WE CAN'T BRING OURSELVES TO PONY UP THAT AMOUNT, WE WILL PAY FAR MORE DEARLY IN THE LONG RUN. (My EMPHASIS.) A similar proposal with a similar cost estimate (about 450 billion dollars over 10 years) has been made independently by physicist Ken Heller (2001). More restrictively, the United States Commission on National Security/21st Century (Hart - Rudman Commission) (2001b) estimates a cost of 64 billion dollars over 10 years to raise the salaries of all public secondary school science and math teachers such that average yearly starting salaries would be raised from the current $25,000 to $50,000.
Lesson 13: The monumental inertia of the educational system may thwart long-term national reform.
The glacial inertia of the nearly immovable U. S. educational system is not well understood. A recent issue of Daedalus (1998) contains essays by researchers in education and by historians of more rapidly developing institutions such as power systems, communications, health care, and agriculture. The issue was intended to help answer a challenge posed by physics Nobelist Kenneth Wilson as quoted in the description at Daedalus (1998):
If other major American "systems" have so effectively demonstrated the ability to change, why has the education "system" been so singularly resistant to change? What might the lessons learned from other systems' efforts to adapt and evolve have to teach us about bringing about change—successful change—in America's schools?
As far as I know, no definitive answer has yet been forthcoming.
Clifford Swartz (1999), former editor of The Physics Teacher and long-time acerbic critic of physics education research, wrote:
There is a variety of evidence, and claims of evidence, that each of the latest fads . . .(constructivism, "group" and "peer" instruction, "interaction") . . . produces superior learning and happier students. In particular, students who interact with apparatus or lecture do better on the "Force Concept Inventory" exam (Hestenes et al. 1992). The evidence of Richard Hake's (1998a) metastatistical study is so dramatic that the only surprising result is that many schools and colleges are still teaching in old-fashioned ways. Perhaps the interaction technique reduces coverage of topics, or perhaps the method requires new teaching skills that teachers find awkward. AT ANY RATE THE NEW METHODOLOGY IS NOT SWEEPING THE NATION. (My EMPHASIS.)
New educational methodologies have, from time to time, swept the nation (e.g., "the new math," PSSC (Physical Science Study Committee) physics, the Keller Plan (Personalized System of Instruction)) but then faded from sight. History (Holton 1986, Arons 1993, 1997, 1998, Sarason 1990, 1996, Cuban 1999) suggests that the present educational reform effort may, like its predecessors, have little lasting impact. This would be most unfortunate, considering the current imperative to:
- (a)
- educate more effective science majors and science-trained professionals;
- (b)
- raise the appallingly low level of science literacy among the general population; and
- (c)
- solve the monumental science-intensive problems (economic, social, political, and environmental) that beset us.
Lesson 14: "Education is not rocket science, it's much harder."
George Nelson, astronaut and astrophysicist, as quoted by Redish (1999).
My own belief, conditioned by 40 years of research in superconductivity and magnetism, 28 years in physics teaching, and 16 years in education research, is that effective education (both physics teaching and education research) is harder than solid-state physics. The latter is, of course, several orders of magnitude harder than rocket science. Nuclear physicist Joe Redish (1999) writes
The principles of our first draft of a community map for physics education are different in character from the laws we would write down for a community map of the physical world. They are much less like mathematical theorems and much more like heuristics. This is not a surprise, since the phenomena we are discussing are more complex and at a much earlier stage of development.
Because education is a complex, early stage, dynamic, non-linear, scientific/sociopolitical, high-stakes system, it might benefit from the expertise of conservation ecologists who are well used to dealing with such challenging systems (Holling 1999).
RESPONSES TO THIS ARTICLE
Responses to this article are invited. If accepted for publication, your response 
will be hyperlinked to the article. To submit a comment, follow 
this link. To read 
comments already accepted, follow this link.
Acknowledgments:
I should like to dedicate this paper to the late Arnold Arons, farsighted pioneer of U. S. physics education research and the major source of wisdom, educational inspiration, and encouragement to me (Hake 1991) and many others over the decades. I thank David Hestenes for insights and assistance, Werner Wittmann for sage comments on statistics, Bill Becker for his stimulating econometric perspectives, and a discerning referee for excellent suggestions that improved this article. I am also indebted to Lee Gass for suggesting that I write this review, and for his very valuable comments on the manuscript. Finally, I thank the National Science Foundation for funding through NSF Grant DUE/MDR-9253965.
LITERATURE CITED
Note: A slash "/" occuring after a URL means "click on the following text."
Abrahamson, A. L. 1999. Teaching with a classroom communication system—what it involves and why it works. Mini-course presented at the VII Taller International Nuevas tendencias en la ensenanza de la fisica, Benemerita Universidad Autonoma de Puebla, Puebla, Mexico, May 27-30. Available online at: http://www.bedu.com/publications.html /"Research Papers."
Adams, J., R. L. Adrian, C. Brick, G. Brissenden, G. Deming, B. Hufnagel, T. Slater, and M. Zeilik. 2000. Astronomy diagnostic test (ADT) version 2.0. Collaboration for Astronomy Education Research (CAER), s.l. Available online at: http://solar.physics.montana.edu/aae/adt/.
Allgood, S., and W. B. Walstad. 1999. The longitudinal effects of economic education on teachers and their students. Journal of Economic Education 30(2):99-111.
Available online at: http://www.indiana.edu/~econed/issues/v30_2/1.htm.
Almer, E. D., K. Jones, and C. Moeckel. 1998. The impact of one-minute papers on learning in an introductory accounting course. Issues in Accounting Education 13(3):485-497. Abstract available online at: http://accounting.rutgers.edu/raw/aaa/pubs/is8-98.htm#aa.
American Association for the Advancement of Science. 1990. The liberal art of science: agenda for action. The report of the project on liberal education and the sciences. AAAS, Washington, D. C., USA.
Anderson, J. L. 1998. Embracing uncertainty: the interface of Bayesian statistics and cognitive psychology. Conservation Ecology 2(1):2. [online] URL: http://www.consecol.org/Journal/vol2/iss1/art2/index.html.
Anderson, D. R., K. P. Burnham, and W. L. Thompson. 2000. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64(4):912-923. Available online at: http://biology.uark.edu/Coop/thompson4.html.
Anderson, J. R., L. M. Reder, and H. A. Simon. 1998. Radical constructivism and cognitive psychology. Pages 227-278 in D. Ravitch, editor. Brookings papers on education policy—1998. Brookings Institution Press, Washington, D. C., USA.
Angelo, T. A., and K. P. Cross. 1993. Classroom assessment techniques: a handbook for college teachers. Second edition. Jossey-Bass, New York, New York, USA.
Annenberg/CPB. 2002. Minds of our own. Available online at http://www.learner.org/progdesc/index.html?uid=26&sj+SCI.
Arons, A. B. 1959. Structure, methods, and objectives of the required freshman calculus-physics course at Amherst College. American Journal of Physics 27(9):658-666.
Arons, A. B. 1981. Thinking, reasoning, and understanding in introductory physics courses. Physics Teacher 19:166-172.
Arons, A. B. 1990. A guide to introductory physics teaching. Wiley, New York, New York, USA.
Arons, A. B. 1993. Uses of the past: reflections on United States physics curriculum development, 1955 to 1990. Interchange 24(1/2):105-128.
Arons, A. B. 1997. Improvement of physics teaching in the heyday of the 1960's. Pages 13-20 in J. Wilson, editor. Conference on the introductory physics course on the occasion of the retirement of Robert Resnick. Wiley, New York, New York, USA.
Arons, A. B. 1998. Research in physics education: the early years. In T. C. Koch and R. G. Fuller, editors. PERC 1998: Physics Education Research Conference proceedings 1998. Available online at: http://webs.csu.edu/~bisb2/FEdnl/perc98.htm.
Aubrecht, G. J. 1991. Is there a connection between testing and teaching? Journal of College Science Teaching 20:152-157.
Bao, L., and E. F. Redish. 2001. Concentration analysis: a quantitative assessment of student states. Physics Education Research (supplement to American Journal of Physics) 69(7):S45-S53.
Barr, R. B., and J. Tagg. 1995. From teaching to learning—a new paradigm for undergraduate education. Change (November/December):13-25.
Becker, W. E. 2001a. What does the quantitative research literature really show about teaching methods? Preprint available online at: http://www.indiana.edu/~sotl/onlinepres.html.
Becker, W. E. 2001b. Private communication to R. R. Hake. 29 July.
Becker, W. E., and W. J. Baumol, editors. 1995. Assessing educational practices: the contribution of economics. Massachusetts Institute of Technology Press, Boston, Massachusetts, USA.
Becker, W. E., and J. R. Powers. 2001. Student performance, attrition, and class size given missing student data. Economics of Education Review 20(4):377-388.
Becker, W. E., and M. Watts, editors. 2000. Teaching economics to undergraduates: alternatives to chalk and talk. Edward Elgar Publishing Ltd., Cheltenham, UK.
Becker, W. E., and M. Watts. 2001. Teaching methods in U. S. undergraduate economics courses. Journal of Economic Education 32(3):269-279. Available online at: http://www.indiana.edu:80/~econed/issues/v32_3/7.htm.
Beichner, R. J. 1994. Testing student interpretation of kinematics graphs. American Journal of Physics 62(8):750-762.
Beichner, R., L. Bernold, E. Burniston, P. Dail, R. Felder, J. Gastineau, M. Gjertsen, and J. Risley. 1999. Case study of the physics component of an integrated curriculum. Physics Education Research (Supplement to American Journal of Physics) 67(7):S16-S24.
    
 Benbasat, J. A., and C. L. Gass. 2001. Reflections on integration, 
      interaction, and community: the Science One program and beyond. Conservation 
      Ecology 5(2):ZZ. [online] URL: http://www.consecol.org/Journal/vol5/iss2/art26. 
    
Benezet, L. P. 1935-1936. The teaching of arithmetic I, II, III: the story of an experiment. Journal of the National Education Association 24(8):241-244, 24(9):301-303, 25(1):7-8. Reprinted in Humanistic Mathematics Newsletter (6): 2-14 (May 1991). Available online at: http://wol.ra.phy.cam.ac.uk/sanjoy/benezet/.
Bernhard, J. 2001. Does active engagement curricula give long-lived conceptual understanding? Pages 749-752 in R. Pinto and S. Surinach, editors. Physics teacher education beyond 2000. Elsevier, Paris, France. Available online at: http://www.itn.liu.se/~jonbe/"Publications"/"physics Education Research."
Better Education, Inc. 2001. Available online at http://www.bedu.com/publications.html.
Bloom, B. S. 1984. The 2 sigma problem: the search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher 13(6):4-16
Bordogna, J. 1997. Making connections: the role of engineers and engineering education. The Bridge 27(1): Spring. Available online at: http://www.nae.edu/nae/naehome.nsf/weblinks/NAEW-4NHMPY?opendocument.
Boyer, E. L. 1990. Scholarship reconsidered: priorities for the professoriate. Carnegie Foundation for the Advancement of Teaching, Menlo Park, California, USA.
Boyer Commission on Educating Undergraduates in the Research University. 1998. Reinventing undergraduate education: a blueprint for America's research universities. Carnegie Foundation for the Advancement of Teaching, Menlo Park, California, USA. Available online at: http://naples.cc.sunysb.edu/Pres/boyer.nsf/.
Bransford, J. D., A. L. Brown, and R. R. Cocking, editors. 1999. How people learn: brain, mind, experience, and school. National Academy Press, Washington, D. C., USA. Available online at: http://www.nap.edu/catalog/6160.html.
Brown, J. S., A. Collins, and P. Duguid. 1989. Situated cognition and the culture of learning. Educational Researcher 18(1):34-41. Available online at: http://www.ilt.columbia.edu/ilt/papers/JohnBrown.html.
Bruer, J. T. 1994. Schools for thought: a science of learning in the classroom. Massachusetts Institute of Technology Press, Boston, Massachusetts, USA.
Bruer, J. T. 1997. Education and the brain: a bridge too far. Educational Researcher 26(8):4-16.
Bunge, M. 2000. Social science under debate: a philosophical perspective. University of Toronto Press, Toronto, Ontario, Canada.
Burnstein, R. A., and L. M. Lederman. 2001. Using wireless keypads in lecture classes. Physics Teacher 39(1):8-11.
Calvino, I. 1988. Six memos for the next millennium. Harvard University Press, Cambridge, Massachusetts, USA.
Carnegie Academy. 2000. Scholarship of teaching and learning. Available online at: http://www.carnegiefoundation.org/CASTL/index.htm.
Carr, J. J. 2000. The physics tutorial: some cautionary remarks. American Journal of Physics 68(11):977-978.
Carver, R. P. 1993. The case against statistical significance testing, revisited. Journal of Experimental Education 61(4): 287-292.
Chabay, R. W. 1997. Qualitative understanding and retention. AAPT Announcer 27(2):96.
Chizmar, J. F., and A. L. Ostrosky. 1998. The one-minute paper: some empirical findings. Journal of Economic Education 29(1):3-10. Available online at: http://www.indiana.edu/~econed/issues/v29_1/1.htm.
Clement, J. M. 2001. The correlation between scientific thinking skill and gain in physics conceptual understanding. AAPT Announcer 31(2):82.
Cohen, J. 1988. Statistical power analysis for the behavioral sciences. Second edition. Lawrence Erlbaum, Mahwah, New Jersery, USA.
Cohen, J. 1994. The earth is round ( p < .05). American Psychologist 49:997-1003.
Collins, A., J. S. Brown, and S. Newman. 1989. Cognitive apprenticeship: teaching students the craft of reading, writing, and mathematics. Pages 453-494 in L. B. Resnick, editor. Knowing, learning, and instruction: essays in honor of Robert Glaser. Lawrence Erlbaum, Mahway, New Jersey, USA.
Cook, T. D., and D. T. Campbell. 1979. Quasi-experimentation: design & analysis issues for field settings. Houghton Mifflin, Boston, Massachusetts, USA.
Cronbach, L. J., and L. Furby. 1970. How should we measure "change" —or should we? Psychological Bulletin 74:68-80.
Crouch, C. H., and E. Mazur. 2001. Peer instruction: ten years of experience and results. American Journal of Physics 69(9):970-977. Available online at http://mazur-www.harvard.edu/library/biblio.taf?r=263&d=d
Cuban, L. 1999. How scholars trumped teachers: change without reform in university curriculum, teaching, and research, 1890–1990. Teachers College Press, New York, New York, USA.
Cummings, K., J. Marx, R. Thornton, and D. Kuhl. 1999. Evaluating innovations in studio physics. Physics Education Research (Supplement to American Journal of Physics) 67(7):S38-S44.
Daedalus. 1998. Education yesterday, education tomorrow. Daedalus 127(4). Described online at: http://daedalus.amacad.org/inprint.html.
Dewey, J. 1929. The sources of a science of education. In J. A. Boydston, editor. 1984. John Dewey: the later works, 1925-1953. Volume 5 (1929-1930). Southern Illinois University Press, Carbondale, Illinois, USA. 17 volumes. Online description available at http://www.siu.edu/~deweyctr/lworks.html. Also available in The collected works of John Dewey, 1882-1953: the electronic edition. CD-ROM; online description at http:/www.siu.edu/~deweyctr/colworks.html.
Dewey, J. 1966 (first published in 1938.) Logic: the theory of inquiry. Holt, Reinhart and Winston, New York, New York, USA.
Dewey, J. 1997 (first published in 1938). Experience and education. Scribner, New York, New York, USA.
Donovan, M. S., J. D. Bransford, and J. W. Pellegrino. 1999. How people learn: bridging research and practice. National Academy Press, Washington, D. C., USA. Available online at: http://www.nap.edu/catalog/9457.html.
Duderstadt, J. J. 2000. A university for the 21st century. University of Michigan Press. Synopses available online at: http://www.press.umich.edu/titles/11091.html and at http://www.nap.edu/issues/16.2/duderstadt.htm.
Duderstadt, J. J. 2001. Science policy for the next 50 years: from guns to pills to brains. In Proceedings of the AAAS Annual Meeting, San Francisco, February, 2001. Available online at: http://milproj.ummu.umich.edu/publications/aaas_text_2.
Dufresne, R. J., W. J. Gerace, W. J. Leonard, J. P.Mestre, and L. Wenk. 1996. Classtalk: a classroom communication system for active learning. Journal of Computing in Higher Education 7:3-47. Available online at: http://umperg.physics.umass.edu/projects/ASKIT/classtalkPaper.
Elby, A. 1999. Another reason that physics students learn by rote. Physics Education Research (Supplement to American Journal of Physics) 67(7):S52-S57.
Elby, A. 2001. Helping physics students learn how to learn. Physics Education Research (Supplement to American Journal of Physics) 69(7):S54-S64.
Eisner, E. W. 1997. The promise and perils of alternative forms of data representation. Educational Researcher 26(6):4-10.
Epstein, J. 1997-1998. Cognitive development in an integrated mathematics and science program. Journal of College Science Teaching (12/97) and (1/98):194-201.
 Felder, R. M., J. E. Stice, and A. Rugarcia. 2000a. The future of engineering education. VI. Making reform happen. Chemical Engineering Education 34(3):208-215. Available online at: http://www2.ncsu.edu/unity/lockers/users/f/felder/public/Papers/Education_Papers.html.
Felder, R. M., D. R. Woods, J. E. Stice, and A. Rugarcia. 2000b. The future of engineering education. II. Teaching methods that work. Chemical Engineering Education 34(1):26-39. Available online at: http://www2.ncsu.edu/unity/lockers/users/f/felder/public/Papers/Education_Papers.html.
Ford, K. W. 1989. Guest comment: is physics difficult? American Journal of Physics 57(10):871- 872.
Francis, G. E., J. P. Adams, and E. J. Noonan. 1998. Do they stay fixed? Physics Teacher 36(8):488-491.
Friedman, M. 1992. Communication: do old fallacies ever die? Journal of Economic Literature 30(4):2129-2132.
Fuller, R. G. 1993. Millikan lecture 1992: Hypermedia and the knowing of physics: standing upon the shoulders of giants. American Journal of Physics 61(4):300-304.
Gage, N. L. 1989. The paradigm wars and their aftermath: a "historical" sketch of research on teaching since 1989. Educational Researcher 18(7):4-10.
Galileo Project. 2001. A leading resource for teaching materials on the Web. Available online at http://galileo.harvard.edu/.
Gardner, H. 1985. The mind's new science: a history of the cognitive revolution. Basic Books, New York, New York, USA.
Gavrin, A. D. 2001. Just in time teaching in physics and beyond: the WebScience Project at IUPUI. AAPT Announcer 31(2):75. Available online at: http://webphysics.iupui.edu/webscience/webscience.html.
Geilker, C. D. 1997. Guest comment: in defense of the lecture-demonstration method of teaching physics. American Journal of Physics 65(2):107.
Ghery, F. W. 1972. Does mathematics matter? Pages 142-157 in A. Welch, editor. Research papers in economic education. Joint Council on Economic Education, New York, New York, USA.
Glass, G. V. 2000. Meta-analysis at 25. Available online at: http://glass.ed.asu.edu/gene/papers/meta25.html.
Glenn Commission. 2000. Before it's too late: a report to the National Commission on Mathematics and Science Teaching for the 21st Century. Available online at: http://www.ed.gov/americacounts/glenn/archive.php.
Goldman, P. 1998. Long live the lecture. Physics World (December):15-16. Available online as "Thoughts on Teaching" at: http://gandalf.physics.uwo.ca/spg/spgFolder/spg.
Griffiths, D. 1997. Millikan lecture 1997: Is there a text in this class? American Journal of Physics 65(12):1141-1143.
Hake, R. R. 1987. Promoting student crossover to the Newtonian world. American Journal of Physics 55(10):878-884.
Hake, R. R. 1991. My conversion to the Arons-advocated method of science education. Teaching Education 3(2):109-111. Available online at: http://www.physics.indiana.edu/~hake.
Hake, R. R. 1992. Socratic pedagogy in the introductory physics lab. Physics Teacher 30:546-552. Updated version available online at: http://physics.indiana.edu/~sdi/.
Hake, R. R. 1995. Correlations of individual student normalized learning gain in mechanics with pretest scores on mathematics and spatial visualization. In preparation.
Hake, R. R. 1998a. Interactive-engagement vs traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics 66(1):64-74. Available online at: http://www.physics.indiana.edu/~sdi/.
Hake, R. R. 1998b. Interactive-engagement methods in introductory mechanics courses. Physics Education Research, supplement to American Journal of Physics. Available online at http://www.physics.indiana.edu/~sdi/.
Hake, R. R. 1998c. Interactive-engagement vs. traditional methods in mechanics instruction. APS Forum on Education Newsletter (Summer): 5-7. Available online at: http://www.physics.indiana.edu/~sdi/.
 Hake, R. R. 1999a. Analyzing change/gain scores. Unpublished. [online] URL: http://www.physics.indiana.edu/~sdi/AnalyzingChange-Gain.pdf.
Hake, R. R. 1999b. REsearch, Development, and Change in Undergraduate Biology Education: a web guide for non-biologists (REDCUBE). Available online at: http://www.physics.indiana.edu/~redcube.
Hake, R. R. 1999c. What can we learn from the biologists about research, development, and change in undergraduate education? AAPT Announcer 29(4):99. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2000a. Towards paradigm peace in physics-education research. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2000b. The general population's ignorance of science-related societal issues—a challenge for the university. AAPT Announcer 30(2):105. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2000c. Is it finally time to implement curriculum S? AAPT Announcer 30(4):103. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2000d. Using the web to promote interdisciplinary synergy in undergraduate education reform. AAPT Announcer 30(4):120. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2001a. Socratric dialogue inducing labs for introductory physics. Available online at: http://www.physics.indiana.edu/~sdi/.
Hake, R. R. 2001b. Suggestions for administering and reporting pre/post diagnostic tests. Available online at: http://www.physics.indiana.edu/~hake/.
Hake, R. R. 2001c. Schwartz invented minute papers. Available online at: http://listserv.nd.edu/cgi-bin/wa?A2=ind0105&L=pod&P=R4417 .
Hake, R. R., R. Wakeland, A. Bhattacharyya, and R. Sirochman. 1994. Assessment of individual student performance in an introductory mechanics course. AAPT Announcer 24(4):76.
Halloun, I. 1997. Views about science and physics achievement. Pages 605-613 in E. F. Redish and J. S. Rigden, editors. Changing role of physics departments in modern universities: proceedings of the ICUPE. American Institute of Physics, College Park, Maryland, USA. Available online at: http://www.inco.com.lb/halloun/hallounTEST.html.
Halloun, I., and D. Hestenes. 1985a. Common sense concepts about motion. American Journal of Physics 53:1056-1065. Available online at: http://www.inco.com.lb/halloun/hallounTEST.html.
Halloun, I., and D. Hestenes. 1985b. The initial knowledge state of college physics students. American Journal of Physics 53:1043-1055. Available online at: http://www.inco.com.lb/halloun/hallounTEST.html.
Halloun, I., and D. Hestenes. 1987. Modeling instruction in mechanics. American Journal of Physics 55:455-462.
Halloun, I., and D. Hestenes. 1998. Interpreting VASS dimensions and profiles. Science & Education 7(6):553-577. Available online (password protected) at http://modeling.la.asu.edu/R&E/Research.html.
Halloun, I., R. R. Hake, E. P. Mosca, and D. Hestenes. 1995. Force Concept Inventory (Revised 1995). Available online (password protected) at http://modeling.la.asu.edu/R&E/Research.html.
Hammer, D. 1999. Physics for first graders? Science Education 83(6):797-799. Available online at: http://www2.physics.umd.edu/~davidham/1stgrdrs.html.
Haycock, K. 1999. The role of higher education in the standards movement in 1999 National Education Summit Briefing Book. Available online at: http://www.achieve.org/achieve/achievestart.nsf /"2001 National Education Summit"/"Information on the 1999 National Education Summit"/"Summit Archives".
Heckman, J. J. 2000. Press release: the 2000 Bank of Sweden prize in economic sciences in memory of Alfred Nobel. Available online at: http://www.nobel.se/economics/laureates/2000/press.html /"Advanced Information".
Heckman, J. J., H. Ichimura, J. Smith, and P. Todd. 1998. Characterizing selection bias using experimental data. Econometric 66:1017-1098. Available online at: http://lily.src.uchicago.edu/papers/papers.html.
Heckman, J. J., G. Tsiang, and B. Singer. 2001. Lectures on longitudinal analysis. Underground Classics in Economics. Westview, Boulder, Colorado, USA.
Heller, K. J. 1999. Introductory physics reform in the traditional format: an intellectual framework. AIP Forum on Education Newsletter (Summer):7-9. Available online at: http://webs.csu.edu/~bisb2/FEdnl/heller.htm; and an illustrated form at http://www.physics.umn.edu/groups/physed/Talks/talks.html.
———. 2000. Meeting the needs of other departments: introductory physics in the ABET 2000 era. Available online at: http://www.aapt.org/ /"Programs".
Heller, K. J. 2001. The time has come to make teaching a real profession. APS Forum on Education Newsletter (Spring). Available online at: http://www.aps.org/units/fed/spring2001/index.html.
Heller, P., and M. Hollabaugh. 1992. Teaching problem solving through cooperative grouping. Part 2: designing problems and structuring groups. American Journal of Physics 60(7):637-644.
Heller, P., R. Keith, and S. Anderson. 1992. Teaching problem solving through cooperative grouping. Part 1: group vs. individual problem solving. American Journal of Physics 60(7):627-636.
Henderson, C. R., K. Heller, and P. Heller. 1999. Common concerns about the Force Concept Inventory. AAPT Announcer 29(4):99 Available online at: http://www.physics.umn.edu/groups/physed/Talks/talks.html.
Herron J. D., and S. C. Nurrenbern. 1999. Chemical education research: improving chemistry learning. Journal of Chemical Education 76(10):1353-1361.
Hestenes, D. 1987. Toward a modeling theory of physics instruction. American Journal of Physics 55:440-454.
Hestenes, D. 1992. Modeling games in the Newtonian world. American Journal of Physics 60(8):732-748. Available online at http://modeling.la.asu.edu/R&E/Research.html
Hestenes, D. 1998. Guest comment: who needs physics education research!? American Journal of Physics 66(6):465-467. Available online at http://modeling.la.asu.edu/R&E/Research.html
Hestenes, D. 1999. The scientific method. American Journal of Physics 67(4):274. Available online at http://modeling.la.asu.edu/R&E/Research.html
Hestenes, D., and M. Wells. 1992. A mechanics baseline test. Physics Teacher 30:159-166.
Hestenes, D., M. Wells, and G. Swackhamer. 1992. Force Concept Inventory. Physics Teacher 30:141-158.
Hilborn, R. C. 1997. Guest comment: revitalizing undergraduate physics—who needs it? American Journal of Physics 65(3):175-177.
Hillborn, R. C. 1998. A reply to C. D. Geilker's guest comment. American Journal of Physics 66(4):273-274.
Hillborn, R. C. 1999. On teaching—innovative and traditional. Physics Teacher 38:250-251.
Holling, C. S. 1997. The inaugural issue of Conservation Ecology. Conservation Ecology 1(1):1. [online] URL: http://www.consecol.org/Journal/vol1/iss1/art1/index.html.
Holling, C. S. 1998. Two cultures of ecology. Conservation Ecology 2(2):4. [online] URL: http://www.consecol.org/Journal/vol2/iss2/art4/index.html.
Holling, C. S. 1999. Introduction to the special feature: just complex enough for understanding; just simple enough for communication. Conservation Ecology 3(2):1. [online] URL: http://www.consecol.org/Journal/vol3/iss2/art1/index.html.
Holton, G. 1986. A nation at risk revisited. Pages 253-277 in G. Holton, editor. The advancement of science and its burdens. University of Cambridge Press, Cambridge, UK.
Hovland, C. I., A. A. Lumsdaine, and F. D. Sheffield. 1949. A baseline for measurement of percentage change. (In C. I. Hovland, A. A. Lumsdaine, and F. D. Sheffield, editors. 1965. Experiments on mass communication. Wiley (first published in 1949).) Reprinted as pages 77-82 in P. F. Lazarsfeld and M. Rosenberg, editors. 1955. The language of social research: a reader in the methodology of social Rerearch. Free Press, New York, New York, USA.
Hunt, M. 1997. How science takes stock: the story of meta-analysis. Russell Sage Foundation, New York, New York, USA.
Inhelder, B., and J. Piaget. 1958. Growth of logical thinking from childhood to adolescence: an essay on the construction of formal operational structures. Basic Books, New York, New York, USA.
Inhelder, B., D. deCaprona, and A. Cornu-Wells. 1987. Piaget today. Lawrence Erlbaum, Mahwah, New Jersey, USA.
Johnson, D. H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63:763-772. Available online at: http://www.npwrc.usgs.gov/resource/1999/statsig/statsig.htm.
Johnson, D. W., R. T. Johnson, and K. A. Smith. 1991. Cooperative learning: increasing college faculty instructional productivity. George Washington University, Washington, D. C., USA.
Johnson, D. W., R. T. Johnson, and M. B. Stanne. 2000. Cooperative learning methods: a meta-anlalysis. Available online at: http://www.clcrc.com/pages/cl-methods.html.
Johnson, M. 2001. Facilitating high quality student practice in introductory physics . Physics Education Research (Supplement to American Journal of Physics) 69(7):S2-S11.
Karplus, R. 1977. Science teaching and the development of reasoning. Journal of Research in Science Education 14:169.
Karplus, R. 2001. A love of discovery: science education and the second career of Robert Karplus. R. G. Fuller, editor. To be published.
Kennedy, P., and J. Siegfried. 1997. Class size and achievement in introductory economics: evidence from the TUCE III Data. Economics of Education Review 16:385-394.
Kirk, R. E. 1996. Practical significance: a concept whose time has come. Educational and Psychological Measurement 56(5):746-759.
Kolitch, S. 1999. Studio physics at Cal Poly. Physics Teacher 37:260.
Lagemann, E. C. 2000. An elusive science: the troubling history of education research. University of Chicago Press, Chicago, Illinois, USA.
Langenberg, D. N. 2000. Rising to the challenge. Thinking K-16 4(1):19. Available online as "Honor in the boxcar" at: http://www.edtrust.org/main/reports.asp.
Lawson, A. E. 1995. Science teaching and the development of thinking. Wadsworth, Stamford, Connecticut, USA.
Lave, J., and E. Wenger. 1991. Situated learning: legitimate peripheral participation. Cambridge University Press, Cambridge, UK.
Laws, P. 1997. Millikan lecture 1996: Promoting active learning based on physics education research in introductory physics courses. American Journal of Physics 65(1):13-21.
Lederman, L. M. 1999. A science way of thinking. Education Week (16 June):XX. Available online at: http://www.edweek.org/ew/1999/40leder.h18.
Light, R. J., J. D. Singer, and J. B. Willett. 1990. By design: planning research on higher education. Harvard University Press, Cambridge, Massachusetts, USA.
Lincoln, Y. S., and E. G. Guba. 1985. Naturalistic inquiry. Sage, Beverly Hills, California, USA.
Livanis, O. 2000. Physics first. Available online at: http://members.aol.com/physicsfirst/index.html.
Lord, F. M. 1956. The measure of growth. Educational and Psychological Measurement 16:421-437.
Lord, F. M. 1958. Further problems in the measurement of growth. Educational and Psychological Measurement 18:437-454.
Mahajan, S., and R. R. Hake. 2000. Is it finally time for a physics counterpart of the Benezet/Berman math experiment of the 1930's? Available online at: http://www.sci.ccny.cuny.edu/~rstein/perc2000.htm and http://wol.ra.phy.cam.ac.uk/sanjoy/benezet/.
Maloney, D. P., T. L. O'Kuma, C. J. Hieggelke, and A. Van Heuvelen. 2001. Surveying students' conceptual knowledge of electricity and magnetism. Physics Education Research (Supplement to American Journal of Physics) 69(7): S12-S23.
Marchese, T. 1998. Not-so-distant competitors: how new providers are remaking the postsecondary marketplace. AAHE Bulletin (May). Available online at: http://www.aahe.org/Bulletin/Not-So-Distant%20Competitors.htm.
Mayer, R. E. 2000. What is the place of science in educational research? Educational Researcher 29(6): 38-39. Available online at: http://www.aera.net/pubs/er/toc/er2906.htm.
Mayer, R. E. 2001. Resisting the assault on science: the case for evidence-based reasoning in educational research. Educational Researcher 30(7):29-30. Available online at http://www.aera.net/pubs/er/toc/er3007.htm.
Mazur, E. 1997. Peer instruction: a user's manual. Prentice Hall, New York, New York, USA. Available online at: http://galileo.harvard.edu/.
McCullough, L. E. 2000. Gender in physics: the past predicting the future? AAPT Announcer 30(2):81. Available online at: http://physics.uwstout.edu/staff/mccullough/physicseduc.htm.
McDermott, L. C. 1991. Millikan lecture 1990: What we teach and what is learned: closing the gap. American Journal of Physics 59(4):301-315.
McDermott, L. C. 1993. Guest comment: how we teach and how students learn— a mismatch? American Journal of Physics 61(4):295-298.
McDermott, L. C., and E. F. Redish. 1999. RL-PER1: resource letter on physics education research. American Journal of Physics 67(9):755-767. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
McKeachie, W. J., editor 1999. McKeachie's teaching tips: strategies, research, and theory for college and university teachers. Eleventh edition. Houghton Mifflin, Boston, Massachusetts, USA.
Meltzer, D. E. 2001. The relationship between mathematics preparation and conceptual learning gains in physics: a possible hidden variable in diagnostic pretest scores. Physics Education Research. Supplement to American Journal of Physics: submitted. Available online as article #5 at http://www.public.iastate.edu/~per/articles/.
Mestre, J., and J. Touger. 1989. Cognitive research—what's in it for physics teachers? Physics Teacher 27:447-456.
Michelson, A. A., and E. W. Morley. 1887. On the relative motion of earth and luminiferous ether. American Journal of Science 1(34):333-345.
Minstrell, J. 1989. Teaching science for understanding. In L. B. Resnick and L. E. Klopfer, editors. Toward the thinking curriculum : current cognitive research. Association for Supervision and Curriculum Development, Alexandria, Virginia, USA.
Mottmann, J. 1999a. Innovations in physics teaching—a cautionary tale. Physics Teacher 37:74-77.
Mottmann, J. 1999b. Mottmann replies. Physics Teacher 37:260-261.
National Research Council. 1999. Global perspectives for local action: using TIMSS to improve U. S. mathematics and engineering education. National Academy Press, Washington, D. C., USA. Available online at: http://www.nap.edu/catalog/9605.html.
National Science Foundation Advisory Committee. 1996. Shaping the future: new expectations for undergraduate education in science, mathematics, engineering, and technology. Available online at: http://www.nsf.gov/cgi-bin/getpub?nsf96139.
National Science Foundation Advisory Committee. 1998. Shaping the future. Volume II: perspectives on undergraduate education in science, mathematics, engineering, and technology. Available online at: http://www.nsf.gov/cgi-bin/getpub?nsf98128.
Neuschatz, M. 1999. What can the TIMSS teach us? The Science Teacher 66(1):23-26.
Novak, G. M., and E. Patterson. 1998. Just-in-time teaching: active learner pedagogy with the WWW. Available online at: http://webphysics.iupui.edu/JITT/ccjitt.html.
Novak, G. M., E. T. Patterson, A. D. Gavrin, and W. Christian. 1999. Just-in-time teaching: blending active learning with web technology. Prentice Hall, New York, New York, USA.
Peat, F. D. 1997. Infinite potential: the life and times of David Bohm. Addison-Wesley, Boston, Massachusetts, USA.
Phillips, D. C. 2000. Expanded social scientist's bestiary: a guide to fabled threats to, and defenses of, naturalistic social science. Rowman Littlefield Publishers Inc., Lanham, Maryland, USA.
Phillips, D. C., and N. C. Burbules. 2000. Postpositivism and educational research. Rowman Littlefield Publishers Inc., Lanham, Maryland, USA.
Phillips, D. C., and J. F. Soltis. 1998. Thinking about education: perspectives on learning. Third edition. Teachers College Press, New York, New York, USA.
Physical Science Resource Center. 2001. American Association of Physics Teachers. Available online at http://www.psrc-online.org//"ResourceCenter"/"Physics Education Research."
Pister, K. 1996. Renewing the research university. University of California at Santa Cruz Review (Winter). Available online at: http://www.ucsc.edu/news_events/review/text_only/Winter-96/Win_96-Pister-Renewing_.html
Redish, E. F. 1994. Implications of cognitive studies for teaching physics. American Journal of Physics 62(9):796-803. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Redish, E. F. 1998. Student expectations in introductory physics. American Journal of Physics 66(3):212-224. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Redish, E. F. 1999. Millikan lecture 1998: Building a science of teaching physics.  American Journal of Physics 67(7):562-573. Available online: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Redish, E. F., J. M. Saul, and R. N. Steinberg. 1997. On the effectiveness of active-engagement microcomputer-based laboratories. American Journal of Physics 65(1):45-54. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Redish, E. F., J. M. Saul, and R. N. Steinberg. 1998. Student expectations in introductory physics. American Journal of Physics 66(3):212-224. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Redish, E. F., and R. N. Steinberg. 1999. Teaching physics: figuring out what works. Physics Today 52(1):24-30. Available online at: http://www.physics.umd.edu/rgroups/ripe/perg/cpt.html.
Reif, F. 1995. Millikan lecture 1994: Understanding and teaching important scientific thought processes. American Journal of Physics 63(1):17-32.
Reif, F., and L. A. Scott. 1999. Teaching scientific thinking skills: students and computers coaching each other. American Journal of Physics 67:819-831.
Rosenthal, R., R. L. Rosnow, and D. B. Rubin. 2000. Contrasts and effect sizes in behavioral research: a correlational approach. Cambridge University Press, Cambridge, UK.
Rozeboom, W. W. 1960. The fallacy of the null-hypothesis significance test. Psychological Bulletin 57:416-428. Available online at: http://psychclassics.yorku.ca/Rozeboom/.
Sarason, S. B. 1993. The predictable failure of educational reform: can we change course before it's too late? Jossey-Bass, New York, New York, USA.
Sarason, S. B. 1996. Revisiting "the culture of the school and the problem of change." Teachers College Press, New York, New York, USA.
Satterthwaite, F. E. 1946. Biomedical Bulletin 2:110.
Saul, J. M. 1998. Beyond problem solving: evaluating introductory physics courses through the hidden curriculum. Dissertation. University of Maryland, College Park, Maryland, USA.
Saunders, P. 1991. The third edition of the test of understanding in college economics (TUCE III). Journal of Economic Education 22(3):255-272. Abstract available online at: http://www.indiana.edu:80/~econed/issues/v22_3/3.htm.
Schneps, M. H., and P. M. Sadler. 1985. Private universe project. Available online at: http://sao-www.harvard.edu/cfa/sed/resources/privateuniv.html.
Schön, D. A. 1995. The new scholarship requires a new epistemology. Change (November/December):27-34.
Schwartz, C. 1983. Minute papers. As described in B. G. Davis, L. Wood, and R. C. Wilson. ABCs of teaching with excellence: a Berkeley compendium of suggestions for teaching with excellence. University of California, Berkeley, California, USA. Available online at: http://uga.berkeley.edu/sled/compendium/.  Minute paper description online at http://www.uga.berkeley.edu/sled/compendium/Suggestions/file95.html.
Shulman, L. 1986. Those who understand: knowledge growth in teaching. Educational Researcher 15(2):4-14.
Shulman, L. 1987. Knowledge and teaching: foundations of the new reform. Harvard Educational Review 57:1-22.
Slavin, R. E. 1992. Research methods in education. Second edition. Allyn & Bacon, Boston, Massachusetts, USA.
Slavin, R. E. 1995. Cooperative learning: theory, research, and practice. Second edition. Allyn & Bacon, Boston, Massachusetts, USA.
Snedecor G. W., and W. G. Cochran. 1989. Statistical methods. Eighth edition. Iowa State University Press, Ames, Iowa, USA.
Springer, L., M. E. Stanne, and S. D. Donovan. 1999. Undergraduates in science, mathematics, engineering, and technology: a meta-analysis. Review of Educational Research 69(1):21-51. Abstract available online at: http://www.aera.net/pubs/rer/abs/rer691-3.htm.
Stein, F. M., and J. G. Hehn. 2000. Re-preparing physics teachers. AAPT Announcer 30(4):95. See also http://positron.aps.org/educ/undergrad/main-phystec.html.
Steinberg, R. N. 1999. Expression of concern. Physics Teacher 37:260.
Steinberg, R. N., and K. Donnelly. 2002. PER-based reform at a multicultural institution. Physics Teacher 40(2):108-114. Available online at http://www.aapt.org/tpt/toc_feb02.html
Stockstad, E. 2001. Reintroducing the intro course. Science 293:1608-1610.
Swartz, C. E. 1993. Editorial: standard reaction. Physics Teacher 31:334-335.
Swartz, C. E. 1999. Editorial: demise of a shibboleth. Physics Teacher 37:330.
Thompson, B. 1996. AERA editorial policies regarding statistical significance testing: three suggested reforms. Educational Researcher 25(2):26-30.
Thompson, B. 1998. Five methodology errors in educational research: the pantheon of statistical significance and other faux pas. Available online at: http://www.coe.tamu.edu/~bthompson/aeraaddr.htm.
Thompson, B. 2000. A suggested revision to the forthcoming 5th edition of the APA Publication Manual. Available online at: http://www.coe.tamu.edu/~bthompson/apaeffec.htm.
Thompson, W. L. 2001. 402 citations questioning the indiscriminate use of null hypothesis significance tests in observational studies. Available online at: http://biology.uark.edu/Coop/thompson5.html.
Thornton, R. K., and D. R. Sokoloff. 1990. Learning motion concepts using real-time microcomputer-based laboratory tools. American Journal of Physics 58(9):858-867.
Thornton, R. K. and D. R. Sokoloff. 1998. Assessing student learning of Newton's laws: the force and motion conceptual evaluation and the evaluation of active learning laboratory and lecture curricula. American Journal of Physics 66(4):338-351.
Tobias, S. 1992a. Guest comment: science education reform: what's wrong with the process? American Journal of Physics 60(8):679-681.
Tobias, S. 1992b. Revitalizing undergraduate science: why some things work and most don't. Research Corporation, Tucson, Arizona, USA.
Tobias, S. 2000. Guest comment: from innovation to change: forging a physics education agenda for the 21st century. American Journal of Physics 68(2):103-104.
Tobias, S., and L. S. Abel. 1990. Poetry for physicists. American Journal of Physics 58(9):816-821.
Tobias, S., and R. R. Hake. 1988. Professors as physics students: what can they teach us? American Journal of Physics 56(9):786-794.
United States Commission on National Security/21st Century (Hart - Rudman Commission). 2001a. Road map for national security: imperative for change, phase III report. Available online at: http://www.nssg.gov/.
United States Commission on National Security/21st Century (Hart - Rudman Commission). 2001b. Journeys through the teacher pipeline: recapitalizing American education; partial cost estimates for road map for national security. Available online at: http://www.nssg.gov/addedumpage.htm.
UMd-PERG. 2001a. Curriculum materials in physics based on physics education research. Available online at http://www.physics.umd.edu/perg/ecs/matper.htm.
UMd-PERG. 2001b. University of Maryland Physics Education Research Group, listing of physics education groups with web homepages. Available online at http://www.physics.umd.edu/perg/homepages.htm.
Van Heuvelen, A. 1991a. Learning to think like a physicist: a review of research-based instructional strategies. American Journal of Physics 59(10):891-897.
Van Heuvelen, A. 1991b. Overview, case study physics. American Journal of Physics 59(10):898-907.
Van Heuvelen, A. 1995. Experiment problems for mechanics. Physics Teacher 33:176-180.
Van Heuvelen, A., and K. Andre. 2000. Calculus-based physics and the engineering ABET 2000 criteria. Available online at: http://www.aapt.org/programs/s2kabet.html.
Vygotsky, L. S. 1978. Mind in society: the development of higher psychological processes. Harvard University Press, Cambridge, Massachusetts, USA.
Wells, M., D. Hestenes, and G. Swackhamer. 1995. A modeling method for high school physics instruction. American Journal of Physics 63(7):606-619. Available online at: http://modeling.la.asu.edu/modeling/MalcolmMeth.html.
Wilson, E. O. 1998. Consilience: the unity of knowledge. Knopf, New York, New York, USA.
Wilson, K. G., and B. Daviss. 1994. Redesigning education. Henry Holt, New York, New York, USA. Available online at: http://www-physics.mps.ohio-state.edu/~kgw/RE.html.
Wilson, R. C. 1986. Improving faculty teaching effectiveness: use of student evaluations and consultants. Journal of Higher Education 57(2):196-211.
Wittmann, W. W. 1997. The reliability of change scores: many misinterpretations of Lord and Cronbach by many others; revisiting some basics for longitudinal research. Available online at: http://www.psychologie.uni-mannheim.de/psycho2/psycho2.en.php3?language=en.
Zeilik, M., C. Schau, N. Mattern, S. Hall, K. W. Teague, and W. Bisard. 1997. Conceptual astronomy: a novel model for teaching postsecondary science. American Journal of Physics 65:987-996.
Zeilik, M., C. Schau, and N. Mattern. 1998. Misconceptions and their change in university-level astronomy courses. Physics Teacher 36(2):104-107.
Zeilik, M, C. Schau, and N. Mattern.1999. Conceptual astronomy. II. Replicating conceptual gains, probing attitude changes across three semesters. American Journal of Physics 67(10):923-927.
Zollman, D. A. 1996. Millikan lecture 1995: Do they just sit there? Reflections on helping students learn physics. American Journal of Physics 64(2):114-119. Available online at http://www.phys.ksu.edu/perg/papers/millikan.html
Address of Correspondent:
Richard Hake
24245 Hatteras Street
Woodland Hills, California 91367, USA
Phone: (818)992-0632
Fax: (818)992-0632
rrhake@earthlink.net
	
		