3

I am working with MATLAB on a model reduction algorithm. It is basically a data processing pipeline.

ckt = generate_ckt(ckt_properties);
freq = generate_fpoints(fconfig);
result = freq_dom_sim(ckt,freq);
red_ckt = run_PRIMA(ckt, red_order);

Each of these are potentially time consuming activities, being that the data I work with is pretty big (10000 × 10000 matrices). So in a previous implementation I had all of these as separate scripts that I had to execute one by one (manually or run a master script). Each of these stored the data in .mat files. The next program would read from this and write its own result in another directory. And so on.

What I would like to use is a framework that can store the dependencies between various pieces of data, such that at any point of time I can just ask it to generate the output.

It should :

  1. Check if the variable is present in the workspace.
  2. If it is, check if its consistent with the expected properties (check with the config data)
  3. If not, load from file (the exact path to the file will be pre-specified).
  4. Check if its consistent with the expected properties.
  5. If not, compute it from the command associated with it. (pre-specified)

I would like this to be recursive, so that effectively I run the last module and it automatically runs checks and actually computes only those pieces of data that are not already available and consistent.

Can you give some suggestions on how to design this? If it is already called something (I assume it must) please point me to it.

1 Answer 1

3

What you are describing in your ideal solution is very similar to what is provided by the make program and makefiles. A makefile essentially expresses a dependency graph from a set of output files, through a set of intermediate files, to a set of input files, along with commands to transform a file at one step to the next.

Inferring names for the various functions you mention above, you might get something like this:

ckt.mat : ckt_properties.mat
    matlab -r generate_ckt.m ckt_properties.mat

freq.mat : fconfig.mat
    matlab -r generate_fpoints.m fconfig.mat

result.mat : ckt.mat freq.mat
    matlab -r freq_dom_sim.m ckt.mat freq.mat

red_ckt.mat : ckt.mat red_order.mat
    matlab -r run_PRIMA.m ckt.mat red_order.mat

This says that ckt.mat depends on ckt_properties.mat, and you can generate ckt.mat when you need to by running matlab generate_ckt.m ckt_properties.mat on the command line. "When you need to" means when the modification time of the source (ckt_properties.mat) is newer than that of the target (ckt.mat).

Now maybe you can do everything with files and makefiles, but this keeps you largely outside of Matlab's IDE. You could also do something purely within Matlab by creating a structure that mimics the aspects of the filesystem that make relies upon, namely file names, modification times, and contents. In other words, create structures that bind a matrix and a modification time (perhaps held as a simple scalar) under a name. Then you would need another structure that encodes the dependency relationships, which is essentially a list of tuples containing a target structure, a list of source structures, and a transformation function. All this is doable (and might even have been done, I don't know), but it might be easier to just use makefiles.

6
  • Awesome answer, thanks! I knew about makefiles, but my friend told me its not really the same thing, so I didn't mention it. Just one more small issue. Would it be easier/better to compare size of matrices rather than date of modification of the .mat file?
    – Milind R
    Commented Feb 27, 2013 at 3:10
  • Also, would it be possible to invoke functions using makefiles?
    – Milind R
    Commented Feb 27, 2013 at 3:11
  • Ok I found out what I needed to. But I don't understand how you pass fconfig.mat as a parameter to generate_fpoints.m. Unable to find a reference for that usage.
    – Milind R
    Commented Feb 27, 2013 at 5:32
  • 2
    You basically need to recompute derived objects when their dependent source objects change. Looking for changes in size is one approach, but can miss things. For example, changing the value of a scalar does not change its size. Looking at the modification time is more reliable, provided it is updated on every change. Computing the checksum of an object is probably the best of all, but can be difficult to compute for structured data like matrices. It is best done with binary blobs of data. Commented Feb 27, 2013 at 18:41
  • 1
    Passing command-line arguments to matlab scripts is a little tricky. I delved into pseudocode when writing my answer. I did some digging and found a couple pages with some suggestions: quora.com/… and stackoverflow.com/questions/7958320/… Commented Feb 27, 2013 at 18:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.