92

In our company we have a small program (.exe 500Kb size) that does mathematical calculation and in the end it spits out the result on a Excel spreadsheet that we use to continue our workflow.

I want to modify the columns, spacing format and add VBA logic etc. on the Excel spreadsheet but since this parameters are not configurable in that program, it seems to me the only way to modify it is to break down/reverse engineer the .exe

Nobody knows in what language it was programmed in, the only thing we know is:

  1. Developed 20+ years ago
  2. Developer retired 10 years ago
  3. GUI Application
  4. Runs standalone
  5. Size 500Kb

Any suggestions what options I have to deal with such kind of problems? Is reverse engineering the only option, or is there a better approach?

23
  • 150
    Do you know what the calculation is that it performs? If so, write a new app, push some test data through both to check the new one works the same, then throw away the old one. Then make the changes you want to make.
    – David Arno
    Commented May 27, 2016 at 14:08
  • 14
    @DavidArno 's comment would make a good answer. Reverse engineering is possible, but re-spec'ing and rewriting the app will be a lot cheaper/easier/quicker. Commented May 27, 2016 at 14:16
  • 44
    The other way to modify it would be to take the result the original program produces and filter it into whatever you want.
    – Blrfl
    Commented May 27, 2016 at 14:24
  • 9
    @Alec if you open the .exe with a hex editor, you may get clues about what it was written in. For example, the compiler name might be embedded. From there you'll know more about possible decompiling options. Commented May 27, 2016 at 17:19
  • 26
    Alternatively, you could attempt to find the gentleman who wrote the application and see if he's willing to come in for a day or two (maybe a couple of hours each day) as a consultant. If he's a retired developer, there's a moderate chance that he might appreciate a little spending money at the rate of $100-150/hr while actually enjoying the moment of doing a bit of work for just a brief period if time.
    – RLH
    Commented May 27, 2016 at 20:10

8 Answers 8

233

Reverse engineering can become very hard, even more if you do not just want to understand the program's logic, but change and recompile it. So first thing I would try is to look for a different solution.

I want to modify the columns, spacing format and add VBA logic etc. on the Excel spreadsheet

If that is the only thing you want, and the calculation done by the program is fine, why not write a program in the language of your choice (maybe an Excel macro) which calls your legacy "exe", takes the output and processes it further.

13
  • 9
    Why does the new program have to call the old EXE? Why not just make the new program independent and then write a script that calls both and coordinates the output and input? My experience suggests that letting command line languages like bash, PowerShell, or command prompt handle process coordinate is generally simpler than trying to code it yourself in an imperative language. Otherwise, +1.
    – jpmc26
    Commented May 27, 2016 at 23:57
  • 8
    @jpmc26: That's true right up until you have to deal with Bash's absurd quoting rules. Yes, they are (mostly) POSIX-compliant. No, they do not make any goddamned sense. $FOO should not word split, for example.
    – Kevin
    Commented May 28, 2016 at 8:30
  • 16
    @jpmc26: I've never had any trouble calling subprocess.run(), personally.
    – Kevin
    Commented May 28, 2016 at 8:39
  • 3
    @jpmc26: What piping? It's pure cookbook; if you want stdout, you pass the magic PIPE constant. Otherwise, you don't and it gets discarded. What's there to understand?
    – Kevin
    Commented May 28, 2016 at 8:43
  • 3
    ... I should add that I did use Excel with VBA in the past as a frontend to command line utilities very successfully more than once. The structure is always the same: a sheet for entering the parameters as a "poor man's UI", a "Start" button on that sheet. In the VBA code, one needs a Shell call in Excel VBA like this one: stackoverflow.com/questions/8902022/…, one can pipe the stdout/stderr from the cmd utility into separate files and then apply the output formatting.
    – Doc Brown
    Commented May 30, 2016 at 8:50
114

In addition to the already given answers by Doc Brown and Telastyn, I would like to suggest an alternative approach (under the assumption it's mission critical).

If you do not know the computations it performs and the calculations are (somewhat) mission-critical: Deduce the original logic in the .exe file by any means necessary. Decode it using a decompiler/disassembler like IDA if necessary. Hire a consultant (or a batch of consultants) if necessary.

Sure, work around it for now using their solution, but do not let it be.

The reason I suggest is as follows: You have admitted that the calculations are very complex (according to an engineer you spoke to). It's also mission-critical. So if somehow the original .exe stops working due to changes in the platforms you have (maybe 16-bit support gets dropped?), you have just lost a mission-critical piece of knowledge.

Now, I'm not concerned about losing the .exe, but about losing the knowledge it encodes. That knowledge must be recovered.

As before: if that knowledge is already available, make sure to write it down in a format that it's not going to be lost anytime soon. Otherwise, recover it and write it down.

13
  • 14
    Modern decompilers actually produce code that's usually quite legible, especially if the original source was in plain C or assembler, and not a higher level language.
    – phyrfox
    Commented May 27, 2016 at 16:33
  • 4
    Very good point. Also: Just patching it up so that it works again will only work until the next fix needs to be implemented. Commented May 27, 2016 at 16:56
  • 33
    @phyrfox 20 years old... developer retired 10 years ago... only output is an Excel spreadsheet... I'd put money on it being a VB6 application.
    – J...
    Commented May 28, 2016 at 1:38
  • 10
    @micaho: or the company still exists and the person with the know-how to verify the results and hidden assumptions has just been hit by a truck. Of course, it's a business risk so ultimately the stakeholders should decide. I just wanted to emphasise that the "wrapper" will work now, but only adds to the technical debt. Commented May 28, 2016 at 11:56
  • 22
    @J...: If it is VB6 then the original poster is in luck. You can recover the source code from a VB6 compilation pretty easily. Commented May 28, 2016 at 12:44
74

Ask the original programmer, if possible.

A few weeks ago i've been contacted by a firm I used to work for 10 years ago with the very same question about an mdb file developed mid 90s.

6
  • 52
    This is the real low hanging fruit. Everyone (including myself) romanticizes the use of hard programming skills like reverse engineering, reimplementing the program's functionality or adding layers to the data processing. In reality, the best place to start is a friendly email which might come back in an hour with the location of the source code or some other ideal solution. Commented May 27, 2016 at 18:36
  • 2
    When at home with a 10 years old application me too I fire up a disassembler but during work hours the goal is different ^^
    – Paolo
    Commented May 27, 2016 at 19:40
  • 2
    Did you remember anything about it? :)
    – Ángel
    Commented May 27, 2016 at 23:31
  • 2
    of course! unfortunately the company undergo 3 acquisition & merge so lots of information got lost and part of the backups was in the lost bag... the development was on site on their machines so I have no copy of the source and that's it.
    – Paolo
    Commented May 28, 2016 at 12:19
  • 1
    Scan the EXE for embedded strings that might include a developer's name or something. That's easier than a full dis-assembly!
    – JDługosz
    Commented Jun 1, 2016 at 9:28
55

Any suggestions what options I have to deal with such kind of problems?

If all you're looking to do is modify the output, then why not simply use composition?

Instead of modifying the black box you can't easily access, you create a new program that takes the Excel output, and does your formatting/column changes too. Then you could make a new exe/script that calls the two programs in order, so it appears to the end user that there is just one program that does all of the work - even though it's two distinct steps under the hood.

10
  • 2
    @Alec Whether java is a suitable language or not mainly depends on the amount of data you need to handle / the amount of computation that you need to do. If both are low, java is fine. If either one is critical, you better drop down to C or C++. But since you seem only to be using an amount of data that fits into an Excel spreadsheet anyway, I don't think there's enough data involved to make java a bad choice (Excel would likely explode before your app does). Commented May 27, 2016 at 15:26
  • 18
    @cmaster the idea that Java is prohibitive for heavy computation is an outdated notion. The worst benchmark listed here isn't even 4x (most are 2x or less) and if a single digit scalar is your breaking point, the savings in safety (which translate directly to developer dollars) is more than likely going to offset the performance hit.
    – corsiKa
    Commented May 27, 2016 at 17:47
  • 8
    @Alec any language will work. VBA seems a good choice because it already integrates with Excel so well. Commented May 27, 2016 at 18:18
  • 4
    @corsiKa That depends entirely on the scale of your application. If a single run consumes several tens of thousands CPU-hours, a factor of 2 or 4 becomes prohibitive: It translates directly into the amount of results that you can get out of a multi-million machine. Also, such applications typically work in lockstep, so garbage collection is pure poison for their performance, the small interruptions would multiply by the number of processes. I tell you, such applications exist, and they are most certainly not written in Java. They are just not used by the average internet business. Commented May 27, 2016 at 18:23
  • 7
    @cmaster We're talking about some simple calculations , not a full blown AAA game engine with realtime global illumination, physically based rendering, animated sparce voxel octrees, universal physics field simulation and the like. No offense, but inserting any argument RE performance here is bad. Ease of use should be #1, and as someone who's been using C++ for a few years it's the last language I would recommend in this case.
    – user22018
    Commented May 27, 2016 at 19:23
4

Write a simple wrapper around the program, capturing its output. It is not complex to do as many languages (Java, C++, Python, .NET, for instance) have means for this. Parse the output and generate another, in the desired form. The user will call your new program. The old executable will stay next to it, or even can be automatically extracted from resource, before invoking it.

This solution of course works well enough only when output is well structured so easy to parse.

That it is a GUI application, is not a blocking problem. You can launch it, generate output, and then automatically post process it when this GUI terminates.

8
  • 3
    How is this different from Doc Brown's top-voted answer?
    – Laf
    Commented May 30, 2016 at 20:31
  • I disagree with the assumption of Doc's answer being badly written. It's clear and succinct.
    – Mast
    Commented May 31, 2016 at 8:32
  • 2
    If you would look into the text of this answer, you will see that the only informative part makes exactly the end of the last sentence "which calls your legacy "exe", takes the output and processes it further."
    – h22
    Commented May 31, 2016 at 9:54
  • 2
    Not a downvoter, and don't see why this got -3... is Meta at it again? but separately, I would advise against lambasting someone else's answer for "contains lots of brain-diluting blah" when (A) that's a subjective judgement and (B) in my subjective opinion, yours contains just that! Commented May 31, 2016 at 12:06
  • This can also be rewritten as "contains uninformative generic talks that just distract from the topic wasting the readers time", if that way looks more helpful. Provides a hint to the right approach on the second half of the last sentence. This had no intention to be insulting. Comment removed.
    – h22
    Commented May 31, 2016 at 12:23
3

There are companies that specialise in exactly this kind of problem. They use proprietary code to decompile native code into a high level language, then apply human expertise to make it useful (e.g. giving variables appropriate names).

Some years ago my employer used this to migrate some native S/390 mainframe code onto Linux servers. We gave them a binary, they gave us source code in C.

Whether this is necessary in your case, is up to you. If you only care about the format of the output, you can simply massage the output after it's been produced. However as others have pointed out, having business logic hidden in a binary blob could be an ongoing risk.

2

Write some tests that exercise as many cases as possible on the old code. Find corner cases, test wrong input, and test correct input.

Pin down what is correct output given various cases, and then try to write an implementation that satisfies the same tests.

I wouldn't go down the reverse engineering route. It's incredibly complicated to reverse machine code, and you should already know what the purpose of the exe is. Reverse engineering is a little too much work for what you're after.

If the software was developed by one guy 20 years ago, it's probably not something that takes a lot of modern power. A GUI program that stretched the machine 20 years ago will barely register on a modern machine, so you're probably looking at something that's relatively simple to reproduce.

0

Try to reverse engineer the exe. Only for the purpose of finding the computation logic or at-least to get a fair hint of what it actually does and if your reverse engineering can get you to that point, you can write new application based on that computation logic. Apart from that, I don't see other wayout.

Easier said than done, reverse engineer an exe created 20 years back is real challenge.

2
  • 12
    The dating of the exe shouldn't really matter
    – Ángel
    Commented May 27, 2016 at 23:32
  • 1
    In fact, with optimizers getting smarter every year, reverse-engineering only becomes harder.
    – MSalters
    Commented May 31, 2016 at 11:39

Not the answer you're looking for? Browse other questions tagged or ask your own question.