0

I'm running into a design problem. My code is in C# but the concepts could apply to any OO language.

I'm designing a framework to run experiments, and these experiments have several variables which we will manipulate, and several variables that we will calculate after manipulation (like columns in an excel sheet). Each variable can have a certain number of possible values. I need to store these variables somehow in a particular order. Then, I will need to create a row of data (one row of the hypothetical excel sheet) using the defined order, where I choose a single value of each variable, and calculate some of the variable values.

EDIT: Based on comments it sounds like I need to explain the purpose. Users should be able to input their experimental design, meaning which variables they want, and what the levels of each variable are. Then my program makes every permutation of those (trials), randomizes the trials, and then iterates through them for the experiment. The output is all just a .csv string. So after each trial it outputs the value of each input variable, and then calculates and outputs the output variables.

I have it all working right now. But at the moment, I need to define the variables manually for the users, and then they can put a list of values in for each on their own. I also have to manually write conversions to strings for each variable, and then manually put them into the correct order and alignment for the csv file. My end goal is to allow them to create their own variables and add it to the experiment without my intervention, and force an implemetation of string output.

what I have:

List<float> levels = new List<float>();
levels.add(1f);
levels.add(2f);
levels.add(3f);
Experiment.variable1.levels = levels;

what I want:

Experiment.AddVariable(new Variable<float>(levels);
// note that Variable should conceivably be any type
// and all variables types should have a method to output its value as a 
// string depending on what type it is.

So i'll have something like: inputvariable1 named "distance" has int values:

1, 2

inputvariable2 named "time given" has float values:

1.1, 1.2, 1.3

inputvariable3 named "some unknown class" has some user-defined values:

CustomClass.OnOffEnum.ON, CustomClass.OnOffEnum.OFF

ouputvariable1 named "response time" has float values:

So I have a class Variable

public class Variable<T> {
    string name
    T value
}

Now, I need to somehow store these variables, so I created a Row class:

public class Row {
    List<Variable> variables;

    public Row(List<Variable> variables) {
        this.variables = variables
    }
}

Now I have a couple problems:

1. How can I define a structure for Rows for each experiment, so that all Rows in a particular experiment will have the same variables in the same order? (same names, different values).

To tackle this problem I considered modifying Row to require an inputted RowSchema object to force it to define a certain schema for its variables. But I have now clue what RowShema class should look like.

2. How can I ensure that each Variable implements a method StringOutput() that I can use to write to a text file?

I considered forcing Variable's generic type T to implement an interface with this method, but I want Variable to be able to be a primitive type too.

something like

public class Variable<T> where T : StringOutputter {

I considered also just using ToString(), but then for custom types, I can't ensure it will be a defined format.

3. It would be nice in other parts of my program to be able to refer to the value of particular variable (by name) of a particular row. Something like this:

row2.<<variablename>>.value

I've pondered this for several days, but can't figure out the correct architecture.


My current implementation is to manually define each Row with fields representing each variable I want. But this is not extendable since I need to redefine the Row class for each experiment.

4
  • 2
    I have a feeling that you are trying to overgeneralize and that as a result you are coming up with the wrong abstractions that hold you back. Design is all about tradeoffs - you loose flexibility in one way so that you can gain it in another. Can you do something simpler (even ostensibly non-OO) to start with and get the system working, and introduce abstraction and other OO design tools later, once you have a better idea where in your code you actually need these things? Commented Nov 20, 2018 at 20:33
  • Also, could you edit the question to explain in a bit more detail what your software is supposed to do (so, not so much how you are trying to design it, but what it needs to accomplish in terms of functionality)? It will probably help us give you better comments and suggestions. Commented Nov 20, 2018 at 20:33
  • If different experiments can have different inputs and outputs, why are you forcing them all into a generic Row class? It will never make sense to display them in the same table.
    – 17 of 26
    Commented Nov 21, 2018 at 13:30
  • Thanks for the feedback. I edited my question accordingly. Yes I already have a "hard coded" functioning version. But the idea is to make this framework reusable for all experiments. I've added more about what I have currently and what I need moving forward. In terms of having the same outputs, all experiments output the Row's variable values as a string in a line of a CSV file. so "variable1value, variable2value, variable3value,....".
    – Adam B
    Commented Nov 21, 2018 at 16:06

1 Answer 1

1

There are several ways to do this. Which one is best is somewhat up to the language you use, the libraries you have, and what makes sense to the clients of this data-structure.

In essence you are asking for a table. Might i suggest trying one of the several hundred database engines out there. Anything that works with the various ADO, DAO, System.Data, or even application internal databases such as SQLite. Each will have various support for C# data-types, there may be one that meets your requirements.

If you wish to not rely on a third-party library for whatever reason, or nothing meets your exact needs. Two approaches come to mind.

  1. Use a Schema
  2. Use a Prototype

A schema roughly speaking is an object that encodes the "type" information for your row/field, usually one schema per object being created (so one row schema holding N field schemas). Essentially each schema is a Row/Field factory. On request it can create a new row/field object, with everything instantiated in the correct order. It might also provide validation, etc. The rows/Fields may have a back-reference to their specific schema (useful for many purposes). The table would only permit one schema. And you could support table level assignment from a table with a superset schema to a table with a subset schema (a schema with the same or fewer fields, with the same, or more general types, etc...).

A Prototype approach makes use of the concept of cloning. You create a row which contains all the type information, and default values. Each time you need a new row, you call clone() on this object. A Table would be composed of numerous rows, and each row might have a different number of fields and types from any other row in the table. The table probably has a isHomogenus() function which checks that every row has the same number of fields, with each field's value sharing the same type as every other field in the same column.

There are two ways to the prototype cloning and they affect how the system evolves, these are: deep cloning, and shallow cloning.

A deep clone creates a full duplicate of the row and fields, every valued is copied, all the type information etc. if you edit either row, the other remains unaffected.

A Shallow clone still creates a duplicate of the row and fields, but does not copy the values directly. Instead it maintains a reference to the original row/field, and a delta (the differences between it and the prototype). When asked for a value, or some other information (its queried) it will look for it in the delta returning that answer if found. If not found, it will ask the original. When the clone is updated it will change the delta. This also means changes to the original row will be reflected in every cloned row that has not been updated to something different. This is roughly the Property Pattern.

As is pretty obvious both types of cloning have advantages, and disadvantages based on how you want this table to behave. Shallow cloning allows for chains of updates to be performed as a tree, deep cloning disallows chained updates while still making it easy to add similar rows.

The schema approach and the prototype approach do have some area of overlap in terms of how they are implemented as a Row/Field Schema's as a factory is functionally similar to the clone method of a prototype. The difference is that a prototype is no different from any other row, while a schema is fundamentally different.

For a strict implementation, prefer the schema approach with both a factory and validation methods. For a lazier approach prefer prototyping.

If you wish to have the best of everything, it is possible to use both schemas and prototyping, however be aware that such a system will have compromises in schema conformity, prototype flexibility, and/or book keeping effort. It is the age old question of static typing vs dynamic typing vs cost.

2
  • Thanks a lot for the detailed answer. It sounds like schemas might be the way to go for my application. Do you know of any good resources you can recommend to learn more about this design pattern?
    – Adam B
    Commented Nov 22, 2018 at 3:11
  • Unfortunately my quick trawl through google revealed nothing specific. Schemas are a form of runtime Typing implemented by the programmer, so take a look at Type Theory. Also look at prototyping and property bags even if not your implementation choice, the article does a good job of showing you how self-managing typing can work. I would also look at the meta-data any relational database stores about its tables: Names, ValueType, constraint checking...
    – Kain0_0
    Commented Nov 22, 2018 at 6:08

Not the answer you're looking for? Browse other questions tagged or ask your own question.