Pointer Error Logging &mdash
Towards an Improved Pedagogy for Teaching Pointers in C++
Abridged Proposal
General Project Description
Pointers are known to be a very difficult programming concept for novice programmers to understand and master and a very challenging topic for CS instructors to teach [1]. They are also a well-known cause of many programming bugs for both novice and experienced programmers [2, 3]. Pointers are cognitively difficult to understand in part because of the level of indirection that they introduce. Furthermore, debugging code that uses pointers is made harder by the fact that most pointer errors (except trying to dereference a null pointer) do not cause the program to abort at the point in which the error occurs. Instead the program may die at a later time with an unhelpful message such as “Segmentation fault”.
In the introductory programming course sequence (CS1/CS2) at Ohio State University, we use the C++ programming language. In addition to the traditional explanation of pointers as memory addresses, we provide students with a simplified model of the value of a pointer variable that allows us to define conditions under which pointer operations are always safe, dangerous, or never safe depending on whether the operation can never, some times, or always result in an error, respectively [2]. For example, allocating a new object with a pointer variable that is uninitialized or has the value null, is always safe (ignoring the possibility of running out of memory); allocating a new object with a pointer variable that is pointing to an existing object is dangerous (could create a memory leak if that’s the only reference to the object, but is ok otherwise); finally, dereferencing a null pointer is never safe.
To provide support for this model we use a special implementation of pointers that we call “checked” pointers [4]. The basic idea of checked pointers is that each pointer operation checks whether the call is safe or not, and if not, it generates an error describing the violated condition and terminates the program. Checked pointers are able to catch all pointer errors. Most pointer errors are reported as soon as they occur (e.g., dereferencing null and dead pointers, creating storage leaks), and the rest are reported at program termination (creating storage leaks in some circular data structures, which cannot be detected by reference counts). Note that our implementation of checked pointers outlaws questionable “features” of built-in C++ pointers such as pointer arithmetic. Here is a list of all the pointer errors that are intercepted by our pointer component (a dead pointer is one that refers to memory that the program does not “own”, i.e., the storage management system has never given it to the program or has reclaimed it from the program):
- creating memory leak by pointer leaving scope
- creating memory leak by using assignment of another pointer
- creating memory leak by using assignment of null value
- creating memory leak by using assignment of new object
- deleting a dead pointer
- dereferencing a dead pointer
- dereferencing a null pointer
- comparing a dead pointer to some other pointer (with == or !=)
- comparing a dead pointer to the null value (with == or !=)
Because our students work on their programming assignments on centralized servers that are entirely under our control, in addition to catching and reporting all pointer errors, checked pointers allow us to record (i.e., log) all the pointer errors that CS2 students experience in the assignments that involve the direct use of pointers. The ability to collect this data puts us in a unique position to explore questions such as: Which kinds of pointer errors are most common? When and why do they occur? Can we provide some of this information to instructors to help them improve their effectiveness in teaching pointers? Can we provide some of this information to students to make them more aware of the pitfalls of using pointers?
We have already been collecting pointer error data for several quarters and plan to continue in the future. Here is an example of a log entry recording one pointer error.
Date: Fri May 19 10:57:58 2006
User: vJ2nm9xAzo2
Command: Stack_Test -i
Message: Creating memory leak by using = (i.e., assignment)
Backtrace:
(1) main
(2) Do_Push(Array_Of_Stack_Of_Text&)
(3) Stack_Kernel_1<…>::Push(Text&)
(4) Pointer_C<…>::operator=(Pointer_C<…> const&)
Briefly, it records the exact day/time when the error occurred, the encrypted (for privacy reasons) user name, the command used to run the program, the message describing the pointer error, and a backtrace showing the call stack when the error occurred.
We propose to analyze the existing data to investigate the questions above (see next section for more details) and other questions that may arise during the project.
Specific Questions/Hypotheses (to be addressed)
We are interested in exploring pedagogical applications of the pointer error data that we are collecting. Here we discuss some specific uses we plan to study. However, an integral part of the project will be to investigate other possible pedagogical uses of the data available.
- Track errors that students are making in “real” time, and send an alert to the instructor that a student is having a particular problem. If a student’s recorded errors exhibit a particular pattern, a warning could be sent to the instructor who could then take appropriate action to assist the student. An interesting question arises that will require analysis of the data: What are significant patterns of errors that should warrant alerting the instructor and what kind of problems do they identify? Or, looking at it another way: What kinds of problems do students experience when working with pointers and is it possible to recognize them based on the patterns of errors that are being logged?
- Provide summary information (and details when needed) to the instructor about the distribution of errors in a specific assignment. Knowledge of the kinds of pointer errors that students make on a specific assignment might allow the instructor to target specific issues in the classroom. Initially, this could be done after the students have completed the assignment: the instructor could discuss the appropriate data with the students and try to determine what kinds of problems the students experienced and how these are reflected in the log. Later, once a better understanding is reached, the instructor could use the data from previous quarters (assuming the same or similar assignments are used, which is the case at OSU) to discuss the assignment before students work on it. The instructor would then be able to warn students of common patters of errors and the corresponding common pitfalls. Here the fundamental questions to address are: What kind of information from the log would be significant and useful for the purpose outlined above? How should this information be presented to the instructor (and the students)? How should it be organized? How should it be displayed?
- Provide to each student “real-time” summary information about the distribution of
errors by all the students working on a given assignment and detailed “real-time”
information on the errors that have been recorded for the student himself/herself.
Giving students access to real-time information on what errors others are making and the
ability to compare that to what the student himself/herself is experiencing could have some
interesting effects: from a practical point of view, this knowledge may allow a student to
judge whether what he/she is experiencing is typical or unusual, and act accordingly; from a
psychological point of view, this knowledge could have a positive impact (e.g., by showing a
student who is having trouble that many others are having the same kind of problem) or a
negative impact (e.g., by making a student who is having more trouble than the rest feel bad).
Questions that need to be addressed include again issues about what information to provide
and how to present it in the most effective way. In addition, it would be interesting to
investigate how students themselves make use of the information provided and whether the
psychological impact (if any) is mainly positive or negative.
Other issues that will have to be dealt with are:
- What is the best way to deliver the information to the user (instructors and/or students)?
- Should the information be made available through the web so that users can access it when they want or should it be delivered only at specific times?
- If made available on the web, what security measures will be needed to ensure student privacy?
- If delivered at specific times, when and how often should it be sent?
Finally, an essential part of the project will be to collect feedback from students and instructors to evaluate the impact of the new techniques and to tune them based on the experience of the users.
References
[1] Lahtinen, E., Ala-Mutka, K., and Järvinen, H., “A study of the difficulties of novice programmers”, Proceedings of the 10th Annual SIGCSE Conference on innovation and Technology in Computer Science Education (Caparica, Portugal, June 27 - 29, 2005). ITiCSE '05. ACM Press, New York, NY, 14-18.
[2] Bucci, P., Heym, W.D., Hollingsworth, J.E., Long, T.J., and Weide, B.W., “An Infrastructure to Study and Address Students' Difficulties with Pointers”, Proceedings of Resolve Workshop 2006, Virginia Tech, Blacksburg, VA, 2006.
[3] Weide, B.W., and Heym, W.D., “Specification and Verification with References”, Proceedings OOPSLA Workshop on Specification and Verification of Component-Based Systems, October 2001.
[4] Pike, S.M., Weide, B.W., and Hollingsworth, J.E. “Checkmate: Cornering C++ Dynamic Memory Errors With Checked Pointers”, Proceedings of the 31st SIGCSE Technical Symposium on Computer Science Education, ACM Press, 2000, 352-356.
Impact on the Goal of CREU
This project provides an opportunity for the students involved to participate in research that deals with an important issue in CS education and a problem that has challenged educators for decades. The scope of the project however, is narrow enough that it can be realistically tackled by the proposed team of undergraduate students. It has several features that relate directly to the goal of CREU: woman-only project team, independently conducted background research, collaborative learning, faculty mentoring, and inclusive pedagogy. The project is open-ended with open ended requirements for the students to explore and answer. There is enough complexity of the project that the students will be challenged to test and improve their project management skills, though the project may be decomposed into separate development modules which maybe organized sequentially as milestones. Through many such successes, students will feel more empowered and confident.
Students will be encouraged to organize and present the results of their project as a technical poster at events such as the Ohio Celebration of Women in Computing conference, the ACM SIGCSE symposium, and the Denman Undergraduate Research Forum at Ohio State University. Because these event are all usually scheduled in early to late spring, the presentation might have to include preliminary results only.
The students are responsible for forming the team and have already showed an understanding of the collaborative nature of the project. Regularly scheduled meetings between the students and the faculty mentor, and between the students only, and pair programming techniques that will be employed in the development phase of the project, will only reinforce this aspect of the project.