54

In 2009 COBOL turned 50 years old. It got some publicity with claims, which I find rather hard to believe:

"Cobol hits 50 and keeps counting" article in the Guardian.

According to David Stephenson, the UK manager for the software provider Micro Focus, "some 70% to 80% of UK plc business transactions are still based on Cobol".

[...]

A lot of this maintenance and development takes place on IBM products. The company's software group director of product delivery and strategy, Charles Chu, says that he doesn't think "legacy" is pejorative. "Business constantly evolves," he adds, "but there are 250bn lines of Cobol code working well worldwide. Why would companies replace systems that are working well?"

Often quoted forum/blog post "50 years on, Cobol still as influential".

The statistics on Cobol attest to its huge influence on the business world: There are over 220 billion lines of Cobol in existence, a figure which equates to about 80 per cent of the world’s actively used code. There over a million Cobol programmers in the world. There are 200 times as many Cobol transactions that take place each day than Google searches.

Jeff Atwood took a stab at it in his blog post "COBOL: Everywhere and Nowhere". Problem is, that it's only anecdotal evidence.

Recently subject surfaced on Programmers.SE, but so far it's also only anecdotal evidence.

Is there any hard data on:

  • total number of lines of all code in use?
  • total number of lines of COBOL code in use?
  • total number of all programmers?
  • total number of COBOL programmers?
2
  • 2
    Here are two links that might be helpful: EE Times, Dr.Dobbs
    – Oliver_C
    Commented Jul 9, 2011 at 22:47
  • 3
    Remember, 1 million lines of COBOL can be rewritten in about 10 lines of C. Commented Oct 3, 2019 at 1:05

1 Answer 1

29

Nobody knows.

None of the sources for total LOC provide any means or methods.

Sources for hard numbers after 2008 were a little hard to come by, my apologies.

Total number of lines of all code in use?

  • One trillion (2001).[19]
    • C/C++: 180 billion, Assembler: 140-220 billion, Other: 280 billion.

Total number of lines of COBOL code in use?

  • 200 billion (2008).[10]
  • 180 billion (2006).[12]
  • 200 billion (2005).[14]
  • 225 billion (2001).[19]
  • 100 billion (2000).[21]

Total number of all programmers?

  • 14.6 million (2009).[8]

Total number of COBOL programmers?

  • 1.5 - 2 million (2008).[19]
  • ~2 million (2000).[20]

If we consider the fact that there are at least 2000 banks in the U.S. alone and that the sources below give numbers between 100 thousand LOC and 343 million LOC for financial systems; Well it adds up fast.

Once you throw in civil and military systems...it's at least one billion.


Sources

2012

1. Case study In this work we analyze a Cobol software portfolio of a large organization operating in the financial sector. The Cobol sources are a mixture of code written manually and generated with Computer-Aided Software Engineering (CASE) tools, such as TELON, COOL:Gen, CANAM, and others.

The portfolio is decades-old and large in many dimensions; for example, in terms of lines of code, number of systems, or number of modules. To give an idea, the portfolio contains more than 18.2 million physical lines of code (LOC) partitioned over 47 information systems.


2. David Brown is worried. As managing director of the IT transformation group at Bank of New York Mellon, he is responsible for the health and welfare of 112,500 Cobol programs, 343 million lines of code, that run core banking and other operations. But many of the people who built that code base, some of which goes back to the early days of Cobol in the 1960s, will be retiring over the next several years.

That's the situation faced by Jim Gwinn, chief information officer for the USDA's Farm Service Agency. "We have millions of lines of Cobol and there's a long history of it being rewritten," he says. "It has become increasingly difficult to change the code because of the complexity and the attrition of the knowledge base that wrote it."


2011

3. We prototyped and applied the proposed strategy on a set of programs from our clients execution environment. These programs were of varying lengths from a few thousand lines of code to around 80 KLOC.


4. The subject software system of this case study is an excerpt of a confidential 100k LOC COBOL system from the banking industry. It consists of approximately 1,100 sections in 150 programs and copybooks (include files).


2010

5. VisualAge PACBASE is an application generator. Billions of lines of COBOL exist all over the world, which have been produced with this environment. For historical reasons, such applications require specific execution contexts, namely old terminals (non graphical window-based screens), mainframes and CICS (Customer Information Control System).

The business scope of SCAFRUITS is broad: order management, shipping,
supplier and product qualification and referencing, timely price management, product activation/inhibition

Concerning its technical facets, the size of the application is estimated to be equal to 3M of LoC, 600 programs, 400 screens, 200 batch programs, 300 potential users, 48,000 product references with only 2,000 active references at a time. There are 350,000 transactions per day and 100,000 created order lines per day.


6. The system under investigation in this paper is a large-sized (> 1 MLOC) industrial application that supports the core activities (e.g., insurances and mortgages) of a major, Belgian bank. Despite that work on the system started as recent as 2005, the bank opted to develop the back-end of this new system in COBOL in order to ease integration with existing infrastructure.


7. Project: The National Endowment for the Arts (NEA). [M]odernization of the NEA’s business systems (Financial Management – Grants Management - Automated Panel Bank)

Fully modernized the 656,000 LOC of Wang-COBOL & RMS flat files to C++ & SQL Server environment & 3,270 screens into a MS Windows environment.

Project: Northrop Grumman. [T]ransformation demonstration and subsequent modernization of the Increments 1 & 3 of the Air Force’s REMIS system.

Fully modernized over 400,000 LOC of Tandem COBOL to both C++ & Java code.


2009

8. For 2009, Evans Data originally estimated that there would be approximately 15.2 million developers worldwide. However, it has reduced that estimated by about 600,000 in the current report.

In North America, Evans Data projected in a previous report that the developer population would grow to 3.85 million in North America during 2009. In the current report, it changed that figure to 3.72 million based on current economic conditions. Evans did not disclose data on other regions.


9. The Social Security Administration is wrapping essential Cobol applications in Extensible Markup Language envelopes and publishing them as service-oriented architecture services. It will retain about 20 percent of the 36 million lines of Cobol code it uses, Hill said.


2008

10. Recent statistics quoted to Datamonitor by IBM reveal the massive scale of intellectual property accumulated:

• Around 200 billion lines of COBOL code are in live operation.
• 75% of the world’s business data, and 90% of financial transactions, are processed in COBOL.
• There are 1.5 – 2 million developers, globally, working with COBOL code.
• Around 5 billion lines of new COBOL code are added to live systems every year.


2007

11. Legacy COBOL systems usually contain millions of lines of code (LOC) spread over thousands of modules, developed by tens of people over many years, are often poorly documented and, to a large extent, knowledge about them is lost.

We used SQuAVisiT to study a large COBOL legacy system of a large insurance fund: 3 thousand modules, 1.7 million LOCs.


2006

12. Gartner has estimated that there are 180 billion lines of Cobol code in use around the world.


2005

13. The customer of the authors is a mid-size German company providing financial services. These services are based upon two large application systems, which share the same HP UNIX platform, but belong to completely different worlds of software technology.

• The total COBOL system consists of 1398 batch programs, 485 on-line programs and 7621 copy modules.
• The total number of lines (LOC) is nearly 2 million, after deducting the comments (~ 25%) the actual code reaches approximately 1.5 million lines.
• The system is maintained by a staff of 8.

Calculating with only 8 % of the net LOC means 120,000 code lines per year added, changed or deleted by eight programmers. Assuming 80MM effort a year the maintenance productivity is 1500 code lines per person month.


14. For example, Cobol remains the most widely deployed programming language in big business, accounting for 75% of all computer transactions and it is not going to go away. Cobol is pervasive in the financial sector (accounting for 90% of all financial transactions), in defense, as well as within established manufacturing and insurance sectors. We estimate that there are over 200 billion lines of Cobol in production today, and this number continues to grow by between three and five percent a year.


15. While the paper is shaped by a very tractable example project (just 90,000 LOC), we have applied the same methodology in other projects. For instance, the methodology for analyzing impact, and for estimating effort and costs was also used to provide a customer with precise information on a project where a 50 million LOC software portfolio had to be investigated for the architectural modification of existing bank account numbers to ten digits.


2004

16. With an estimated 60% to 80% of all business applications still written in COBOL, it was no surprise to find exactly this in the code base of the companies involved in ARRIBA. COBOL therefore quickly gained much of our focus.

The code is badly structured and poorly documented. The amount of code is huge (millions of LOC) and has been adapted many times for several reasons (switching platforms, year 2000 conversions, transition to the Euro currency,…). So keeping the documentation synchronized with those evolutionary changes didn't always happen.


17. During the last three decades, a considerable amount of software was developed using procedural languages. For example, Coyle estimates the size of systems written in Cobol to be more than 100 billion LOC.


18. Two case studies were carried out with our transformations on real-life industrial Cobol systems. The source code in the first case study was IBM Cobol and came from the same banking company as the code of the source base. There was one large system of 2.6 million LOC in almost 1000 programs. The program sizes ranged from about 40 to 13000 LOC. The number of statements per program ranged from two to 4000 statements. In the whole system, there were about 400000 statements.

In the second case study several systems that were written in Micro Focus Cobol were transformed. Similar to the first case study, the total size was 2.6 million LOC, but this was a coincidence. The source code consisted of almost 3000 programs and the sizes of the individual programs ranged from 25 to 8000 LOC. The number of statements per program ranged from 10 to almost 3400. In total, there were over 1.2 million statements. This significant higher number of statements compared with the first case study was because in the first case study a great deal of code was used for data declarations.


2001

19. First, there are about 300 Cobol dialects, and each compiler product has a few versions—with many patch levels. Also, Cobol often contains embedded languages such as DMS, DML, CICS, and SQL. So there is no such thing as "the Cobol language." It is a polyglot, a confusing mixture of dialects and embedded languages—a 500-Language Problem of its own. Second, according to Jones, the world's installed software is distributed by language as follows:
Cobol: 30 percent (225 billion LOC)
• C/C++: 20 percent (180 billion LOC)
• Assembler: 10 percent (140 to 220 billion LOC)
• less common languages: 40 percent (280 billion LOC)

Because there are about a trillion lines of installed software written in myriad languages, its solution is a step forward in managing those assets.


2000

20. There are about two million COBOL programmers in the world - more than twice the number of JAVA programmers.


21. Estimated at over 100 billion lines of code, most of it Cobol, it drives the world's infrastructure. The end result is a new appreciation of legacy and a search for ways to capitalize on its potential.


1996

22. However, we have just (January 1996) embarked on a large project dealing with this subject, in collaboration with several industrial partners including the Dutch ABN-AMRO bank, and we feel that this application is too interesting to leave undiscussed in a paper on industrial applications of ASF+SDF. The problem at hand involves the analysis, cleaning-up and reconstruction of a large suite (25,000 programs, 30M LOC) of mainframe-based COBOL applications. The two main problems currently studied are conversions between COBOL dialects and identification and correction of software errors related to the "year 2000".

3
  • 1
    Many of these references simply speak to the existence of individual large codebases or beg the question ("Gartner has estimated that there are 180 billion lines of Cobol code in use"). Can you edit to point to some kind of primary source? Commented Dec 14, 2012 at 20:05
  • @LarryOBrien Sure. Are you looking for means and methods or would a consensus do?
    – Rusty
    Commented Dec 15, 2012 at 0:15
  • 1
    I'd certainly prefer something with a published methodology. Gartner, to name a major offender, has been quite unreliable over the years but is never called to task. They were one of the major promoters of the extraordinarily poorly-sourced and vastly over-hyped Y2K impact (Gartner predicted $300-$600B in direct costs, "at least 1" theft in excess of $1B, and that 50% of organizations would suffer at least 1 mission-critical failure due to Y2K). Commented Dec 15, 2012 at 0:31

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .