Nobody knows.
None of the sources for total LOC provide any means or methods.
Sources for hard numbers after 2008 were a little hard to come by, my apologies.
Total number of lines of all code in use?
- One trillion (2001).[19]
- C/C++: 180 billion, Assembler: 140-220 billion, Other: 280 billion.
Total number of lines of COBOL code in use?
- 200 billion (2008).[10]
- 180 billion (2006).[12]
- 200 billion (2005).[14]
- 225 billion (2001).[19]
- 100 billion (2000).[21]
Total number of all programmers?
Total number of COBOL programmers?
- 1.5 - 2 million (2008).[19]
- ~2 million (2000).[20]
If we consider the fact that there are at least 2000 banks in the U.S. alone and that the sources below give numbers between 100 thousand LOC and 343 million LOC for financial systems; Well it adds up fast.
Once you throw in civil and military systems...it's at least one billion.
Sources
2012
1. Case study In this work we analyze a Cobol software portfolio of a
large organization operating in the financial sector. The Cobol
sources are a mixture of code written manually and generated with
Computer-Aided Software Engineering (CASE) tools, such as TELON,
COOL:Gen, CANAM, and others.
The portfolio is decades-old and large in many dimensions; for
example, in terms of lines of code, number of systems, or number of
modules. To give an idea, the portfolio contains more than 18.2
million physical lines of code (LOC) partitioned over 47 information
systems.
2. David Brown is worried. As managing director of the IT transformation
group at Bank of New York Mellon, he is responsible for the health and
welfare of 112,500 Cobol programs, 343 million lines of code, that run
core banking and other operations. But many of the people who built
that code base, some of which goes back to the early days of Cobol in
the 1960s, will be retiring over the next several years.
That's the situation faced by Jim Gwinn, chief information officer for
the USDA's Farm Service Agency. "We have millions of lines of Cobol
and there's a long history of it being rewritten," he says. "It has
become increasingly difficult to change the code because of the
complexity and the attrition of the knowledge base that wrote it."
2011
3. We prototyped and applied the proposed strategy on a set of programs
from our clients execution environment. These programs were of varying
lengths from a few thousand lines of code to around 80 KLOC.
4. The subject software system of this case study is an excerpt of a
confidential 100k LOC COBOL system from the banking industry. It
consists of approximately 1,100 sections in 150 programs and copybooks
(include files).
2010
5. VisualAge PACBASE is an application generator. Billions of lines of
COBOL exist all over the world, which have been produced with this
environment. For historical reasons, such applications require
specific execution contexts, namely old terminals (non graphical
window-based screens), mainframes and CICS (Customer Information
Control System).
The business scope of SCAFRUITS is broad: order management, shipping,
supplier and product qualification and referencing, timely price management,
product activation/inhibition…
Concerning its technical facets, the size of the application is
estimated to be equal to 3M of LoC, 600 programs, 400 screens, 200
batch programs, 300 potential users, 48,000 product references with
only 2,000 active references at a time. There are 350,000 transactions
per day and 100,000 created order lines per day.
- Barbier, F., Eveillard, S., Youbi, K., & Cariou, E. (2010). Model-driven reverse engineering of COBOL-based applications. Information Systems Transformation. Architecture Driven Modernization Case Studies, Morgan Kauffman, Burlington, MA, 283-299.
6. The system under investigation in this paper is a large-sized (> 1
MLOC) industrial application that supports the core activities (e.g.,
insurances and mortgages) of a major, Belgian bank. Despite that work
on the system started as recent as 2005, the bank opted to develop the
back-end of this new system in COBOL in order to ease integration with
existing infrastructure.
- Kellens, A., Noguera, C., D'Hondt, T., Jorissen, L., & Van Passel, B. (2010, September). Verifying the design of an outsourced COBOL system with IntensiVE. In Software Maintenance (ICSM), 2010 IEEE International Conference on (pp. 1-8). IEEE.
7. Project: The National Endowment for the Arts (NEA). [M]odernization of
the NEA’s business systems (Financial Management – Grants Management -
Automated Panel Bank)
Fully modernized the 656,000 LOC of Wang-COBOL & RMS flat files to C++
& SQL Server environment & 3,270 screens into a MS Windows
environment.
Project: Northrop Grumman. [T]ransformation demonstration and
subsequent modernization of the Increments 1 & 3 of the Air Force’s
REMIS system.
Fully modernized over 400,000 LOC of Tandem COBOL to both C++ & Java
code.
2009
8. For 2009, Evans Data originally estimated that there would be
approximately 15.2 million developers worldwide. However, it has
reduced that estimated by about 600,000 in the current report.
In North America, Evans Data projected in a previous report that the
developer population would grow to 3.85 million in North America
during 2009. In the current report, it changed that figure to 3.72
million based on current economic conditions. Evans did not disclose
data on other regions.
9. The Social Security Administration is wrapping essential Cobol
applications in Extensible Markup Language envelopes and publishing
them as service-oriented architecture services. It will retain about
20 percent of the 36 million lines of Cobol code it uses, Hill said.
2008
10. Recent statistics quoted to Datamonitor by IBM reveal the massive scale of intellectual property accumulated:
• Around 200 billion lines of COBOL code are in live operation.
• 75% of the world’s business data, and 90% of financial transactions, are
processed in COBOL.
• There are 1.5 – 2 million developers, globally, working with COBOL code.
• Around 5 billion lines of new COBOL code are added to live systems every year.
2007
11. Legacy COBOL systems usually contain millions of lines of code (LOC)
spread over thousands of modules, developed by tens of people over
many years, are often poorly documented and, to a large extent,
knowledge about them is lost.
We used SQuAVisiT to study a large COBOL legacy system of a large
insurance fund: 3 thousand modules, 1.7 million LOCs.
2006
12. Gartner has estimated that there are 180 billion lines of Cobol code
in use around the world.
2005
13. The customer of the authors is a mid-size German company providing
financial services. These services are based upon two large
application systems, which share the same HP UNIX platform, but
belong to completely different worlds of software technology.
• The total COBOL system consists of 1398 batch programs, 485 on-line
programs and 7621 copy modules.
• The total number of lines (LOC) is
nearly 2 million, after deducting the comments (~ 25%) the actual code
reaches approximately 1.5 million lines.
• The system is maintained
by a staff of 8.
Calculating with only 8 % of the net LOC means 120,000 code lines per
year added, changed or deleted by eight programmers. Assuming 80MM
effort a year the maintenance productivity is 1500 code lines per
person month.
14. For example, Cobol remains the most widely deployed programming
language in big business, accounting for 75% of all computer
transactions and it is not going to go away. Cobol is pervasive in the
financial sector (accounting for 90% of all financial transactions),
in defense, as well as within established manufacturing and insurance
sectors. We estimate that there are over 200 billion lines of Cobol in
production today, and this number continues to grow by between three
and five percent a year.
15. While the paper is shaped by a very tractable example project (just
90,000 LOC), we have applied the same methodology in other projects.
For instance, the methodology for analyzing impact, and for estimating
effort and costs was also used to provide a customer with precise
information on a project where a 50 million LOC software portfolio had
to be investigated for the architectural modification of existing bank
account numbers to ten digits.
2004
16. With an estimated 60% to 80% of all business applications still
written in COBOL, it was no surprise to find exactly this in the code
base of the companies involved in ARRIBA. COBOL therefore quickly
gained much of our focus.
The code is badly structured and poorly documented. The amount of code
is huge (millions of LOC) and has been adapted many times for several
reasons (switching platforms, year 2000 conversions, transition to the
Euro currency,…). So keeping the documentation synchronized with those
evolutionary changes didn't always happen.
17. During the last three decades, a considerable amount of software was
developed using procedural languages. For example, Coyle estimates the
size of systems written in Cobol to be more than 100 billion LOC.
18. Two case studies were carried out with our transformations on
real-life industrial Cobol systems.
The source code in the first case study was IBM Cobol and came from
the same banking company as the code of the source base. There was one
large system of 2.6 million LOC in almost 1000 programs. The program
sizes ranged from about 40 to 13000 LOC. The number of statements per
program ranged from two to 4000 statements. In the whole system, there
were about 400000 statements.
In the second case study several systems that were written in Micro
Focus Cobol were transformed. Similar to the first case study, the
total size was 2.6 million LOC, but this was a coincidence. The source
code consisted of almost 3000 programs and the sizes of the individual
programs ranged from 25 to 8000 LOC. The number of statements per
program ranged from 10 to almost 3400. In total, there were over 1.2
million statements. This significant higher number of statements
compared with the first case study was because in the first case study
a great deal of code was used for data declarations.
2001
19. First, there are about 300 Cobol dialects, and each compiler product has a few versions—with many patch levels. Also, Cobol often
contains embedded languages such as DMS, DML, CICS, and SQL. So there
is no such thing as "the Cobol language." It is a polyglot, a
confusing mixture of dialects and embedded languages—a 500-Language
Problem of its own. Second, according to Jones, the world's installed
software is distributed by language as follows:
• Cobol: 30 percent (225 billion LOC)
• C/C++: 20 percent (180 billion LOC)
• Assembler: 10 percent (140 to 220 billion LOC)
• less common languages: 40 percent (280 billion LOC)
Because there are about a trillion lines of installed software
written in myriad languages, its solution is a step forward in
managing those assets.
2000
20. There are about two million COBOL programmers in the world - more than
twice the number of JAVA programmers.
21. Estimated at over 100 billion lines of code, most of it Cobol, it
drives the world's infrastructure. The end result is a new
appreciation of legacy and a search for ways to capitalize on its
potential.
1996
22. However, we have just (January 1996) embarked on a large project
dealing with this subject, in collaboration with several industrial
partners including the Dutch ABN-AMRO bank, and we feel that this
application is too interesting to leave undiscussed in a paper on
industrial applications of ASF+SDF. The problem at hand involves the
analysis, cleaning-up and reconstruction of a large suite (25,000
programs, 30M LOC) of mainframe-based COBOL applications. The two main
problems currently studied are conversions between COBOL dialects and
identification and correction of software errors related to the "year
2000".