The Year 2000 problem, also known as the "Millenium Bug" or simply "Y2K" (Neither is a good name. The year 2000 is not the start of a new millenium; it's the last year of this millenium. And do we really want to shorten the name to "Y2K"? Isn't that how we got into this mess in the first place?), was the mother of all maintenance assignments.
For the first time in the "vast" history of data processing the entire industry was faced with the same problem at the same time. There was no one who had gone through this before that could have been turned to for help. Everyone was in the same boat. Throw in a shortage of programmers and a deadline that couldn't be moved and you're looking at a lot of fun at the end of 1999. As a manager at Bank One said, "You better wear a cup."
The problem was that dates (a very common piece of data), when stored in files, were only given 2 digits to hold the year. Since computers truncate numbers that are bigger than what's expected the year 2000 was stored as 00. Sometimes that's OK. But sometimes it isn't.
If the date is only being used for informational purposes then it probably is no big deal if the year shows as 00. For example, showing the current date on a screen or a date that a report was printed. If the date appears on a screen or document that stays internal to a company then it likely kept its two-digit year. If it appears on something that an external client or customer will see then it probably was expanded to print 4 digits for the year just to give that client or customer the reassurance that the company actually did something about Y2K.
If the date is being used in a comparison, or a calculation, then the year has to have 4 digits. For example, if a payment date (say Dec 28, 1999, which would be stored as 991228) is being compared to a due date (say Jan 02, 2000, which would be stored as 000102) to see if the payment is on time it would appear that that the payment was made over 99 years late when it was actually about a week early. The same kind of thing can happen when calculating the amount of time between two dates, or trying to determine what date is a specified number of days before or after another date. To ensure that these comparisons and calculations are valid, 4-digit years must be used.
In case you're wondering why a date would be stored as Year-Month-Day (or YYMMDD) instead of the more familiar MMDDYY format, it's to make it easier to compare dates and to sort them. Storing in the YYMMDD format is a standard in the industry.
There was also specualtion that a lot of hardware would fail at one second after midnight on 01/01/2000 because the year will be interpreted as 1900 and that will somehow confuse the circuitry. The older the machine, the more likely it happened. This wasn't limited to computers - anything with a computer chip was vulnerable. Things like elevators, cameras, microwave ovens, coffee makers and security systems were liable to experience trouble. The doomsayers (check Gary North's web page for an interesting collection of readings) went as far as saying airplanes will fall out of the sky and power grids will fail.
There are three recognized sources of the problem.
The most obvious source of the problem was that, at one time, storage space was limited and expensive. Storing a year using 2 digits instead of 4 was a savings of 50%. May not sound like much but when you're looking at a file with over a million records, each with a few date fields, the savings added up.
A second cause to the problem was the misconception that a program or system developed in the 70's or early 80's would be rewritten by the time a 2-digit year would no longer work. But most shops have the "If it ain't broke then don't fix it" philosophy so code is living much longer than originally expected.
The third cause is subtle. People tend to think of years in 2 digits. It's just natural.
Yes. And no. Depends on what exactly is being asked.
If you're asking if it affected a lot COBOL code, then the answer is "yes". Absolutely. A majority of the affected code was COBOL becuase a majority of the existing code, especially code from the 70's and 80's when the Y2K problem wasn't even a thought, is COBOL. Kind of like the law of averages.
If you're asking if this problem was due to a flaw with the COBOL language then the answer is "no". Absolutely not. Dates are stored as numeric data, just like dollar amounts, social security numbers and ages. There is no magical data type of 'date' (at least until DB/2 evolved into standardized SQL). COBOL could handle 4-digit numbers just as easily as 2-digit numbers from day one. A non-Y2K compliant program can be written just as easily (if not easier) in C++.
As you know, the century turned with minimal trouble. Among the reported glitches was a man being hit with 99 years worth of late fees when he returned a rented video and the web site for the Philadelphia stock exchange showing the incorrect year. That's a PC based client/server application and a web page. You think either was coded in COBOL?
There are two methods to fixing this problem: file expansion and windowing (originally called the interpretive method, but that didn't sound very computer-like).
File expansion is the true fix. The 6-digit date field (YYMMDD) is physically expanded on the file to be 8-digits (add the century - CCYYMMDD). This causes the restructuring of many files and changing many programs, but then you're good to go (until the year 10000).
The second method, windowing, only requires coding changes. No files are physically altered. All dates are still read as 6-digit numbers but if the century is needed then a windowing algorithm is called to supply the century based on the year. This expanded date is only internal to the program, it is not saved anywhere permanently. There are two types of windowing: fixed and sliding.
A fixed window has a two-digit year selected (by each company) as the pivot year. If a year is greater than the pivot year then the century is 19, else it is 20. This algorithm will need updated every 100 years, if not more frequently.
The sliding window uses the current date as the pivot and is 100 years long. The window goes x years in the past and 100 - x years in the future (x is selected by each company). For each 6-digit date the century is picked so that the date falls within the window. It "slides" as the current date changes.
During my time with the CIS department at Ohio State I have come in contact with several C/C++/Java/Perl students who will ask why an institution would continue to teach COBOL. Isn't it being replaced? Isn't it out-dated? Haven't most places already converted to something else?
No. No. No.
Of course, they look at me like I just fell off of a turnip wagon and had it back over me before I had a chance to stand back up. They're convinced, for some reason, that this is a dead language. They can offer no justification for this opinion, it just is. Pretty strong conviction for someone who's had no exposure to the industry. Maybe it's just hopeful thinking.
There is a simple reason why it isn't being replaced and new development is not being done in another language. The same reason why existing COBOL systems aren't being rewritten in another language. The reason? There is no alternative to turn to.
What about C/C++? What about Java? Not likely. COBOL can do too many things that other languages cannot do, or cannot do easily. Like accessing indexed-sequential files, producing formatted numeric output, access heirarchical and network databases, run in a pseudo-conversational interactive environment, process huge files. Plus, those other languages have no concept of "records" (at least as being the natural components of a file). For business applications these are not trivial matters. They are vital.
Remember, C++ and Java weren't developed to compete with, or replace COBOL. COBOL is a business language (that's what the "B" stands for), C/C++ is not. There is no "B" in C or Java. If you knew all these languages and were given an assignment, you wouldn't have to sit and think about which language would be best for a task. It would clearly be one and not the others. A program that should be written in C/C++ would never be written in COBOL, and vice versa. Same with Java.
I'm not slamming any of these languages. I like C++ and I like Java. They're great for what they were intended for, just like COBOL, SQL and Visual Basic. Take any language out of its element and it quickly becomes inadequate. You need to develop a web app, go with Java. You need a Windows-based GUI app or GUI front end for a client-server app, then use Visual Basic. Middle-tier for that client-server application? C++ or C#. Need to process large files for a business app? COBOL. Formatted reports for that business app? COBOL. How about an on-line interface to that system that could have several, even hundreds of concurrent users (in-house)? COBOL.
OK. Let's pretend that Java was a suitable alternative. So we're saying it either can handle all the things a business needs in its processing or that the company is willing to live without a lot of its current functionality. A big stretch wither way. Let's look at the business case.
How long would it take a small company, with something like 15,000 programs, to convert to Java? Let's say a person can get two programs done in a week (20 hours each). This includes coding, testing and implementation. Realistic? Probably not. But it's a nice round number. If you figure a person works about 1,920 hours a year (the standard 2,080 hours in a year, minus 2 weeks vacation and minus another 10 days for holidays and sick time) then he/she can convert 96 programs a year. This assumes that any non-productive time (in meetings, on the phone, etc.) is made up for. At that pace this conversion would take over 150 man-years.
If that coder can get one program done a day it would still take 62.5 man-years. So at that pace an army of 63 consultants could be brought in for a year and get it all done.
How much would it cost? 63 consultants for a year? Ones that know COBOL and Java well enough to pull this off? At $75 an hour (a little low, I think, but a round number) it would be over $9 million.
What's the end result? If a program is rewritten correctly then it will do exactly what it did before. The same results, the same output. The end user will see no difference. Except now you have a program that's harder to read, harder to document and harder to maintain. Hardly worth the effort and definitely not worth the cost. Can you imagine trying to get approval on a budget item of nine million dollars to reproduce what you're already getting? Maybe the report will come off the printer faster but it still won't get delivered until the guy in the mail room makes his rounds.
Now consider a large company with COBOL programs numbering several more thousands.
There's more new COBOL development going on now than there ever was. Y2K tied up resources so there was a lot of work placed on the back burner that now needs to be done. Those new Y2K-compliant systems will need maintained. Y2K fixes will need corrected.
It's been estimated that over 150 billion lines of COBOL code is currently in use (not just exists, but is in use) with another 5 billion added annually. It's also been estimated that over 98% of the world's economy is processed by COBOL code. More new development in COBOL than ever. And the language is still being revised. Exisitng COBOL code works as expected. No one has developed a language to challenge it. The future of COBOL is safe.