21
$\begingroup$

In this previous post (Data Table Manipulation in Mathematica) we find the brilliant suggestion proposed by Leonid Shifrin that allows to select specific rows identified by the header name.

I'm now looking for a way to then also select the data of a specific column (or more columns) of the subset of data generated by Leonid's select function.

Suppose we have this data:

table = List[
{"ID", "Variable 1", "Variable 2"}, 
{"Alpha", 1, 0}, 
{"Beta", 1, 1}, 
{"Alpha", 0, 0}]

which when formatted as a grid looks like this:

ID     Variable 1 Variable 2
Alpha       1          0
Beta        1          1
Alpha       0          0

Using Leonid's function we can obtain this:

subset = select[table, where["Variable 1" == 1]]

(* {{"Alpha", 1, 0}, {"Beta", 1, 1}} *)

Suppose that we now want to sum all the values that are in the column named 'Variable 2'. We could just do this:

Total[subset[[All,3]]]

However, in this way we loose the advantage of using Leonid's function that can work directly with header names. Ideally, I would like to find a way to get those data by using the header name. Something like:

Total[subset[[All,"Variable 2"]]]

or, even better:

select[table, where["Variable 1" == 1], columns["Variable 2"]]

Any hints would be highly appreciated.

$\endgroup$

2 Answers 2

20
$\begingroup$

This modification of select should do what you ask, while keeping the previous functionality:

ClearAll[select, where];
SetAttributes[where, HoldAll];
select[table : {colNames_List, rows__List}, where[condition_], 
    cols : (columns[varNames__] | All) : All] :=
   With[{namingRules = 
      Dispatch[Thread[colNames -> Thread[Slot[Range[Length[colNames]]]]]]},
      With[{selF = Apply[Function, Hold[condition] /. namingRules], 
         cl = {varNames} /. namingRules /. Verbatim[Slot][n_] :> n},
          If[cols ===  All, #, #[[All, cl ]]] &@Select[{rows}, selF @@ # &]]];

For example,

select[table,where["Variable 1"==1],columns["Variable 1"]]
(* {{1},{1}} *)

select[table,where["Variable 1"==1],columns["Variable 2","Variable 1"]]
(* {{0,1},{1,1}} *)

select[table,where["Variable 1"==1]]
(* {{Alpha,1,0},{Beta,1,1}} *)

You can also use the plain column numbers in columns:

select[table,where["Variable 1"==1],columns[3,2]]
(* {{0,1},{1,1}} *)

or a mix:

select[table,where["Variable 1"==1],columns[3,"Variable 1"]]
(* {{0,1},{1,1}} *)

EDIT

Per request of @Viktor, expressed in comments, here is a generalization of select where the where clause is made optional as well:

ClearAll[select, where];
SetAttributes[where, HoldAll];
select[table : {colNames_List, rows__List}, w : (where[condition_] | None) : None,
     cols : (columns[varNames__] | All) : All] :=
  With[{namingRules = Dispatch[Thread[colNames -> Thread[Slot[Range[Length[colNames]]]]]]}, 
    With[{cl = {varNames} /. namingRules /. Verbatim[Slot][n_] :> n},
      If[cols === All, #, #[[All, cl]]] &@
        If[w === None,
           {rows},
           (* else *)
           With[{selF = Apply[Function, Hold[condition] /. namingRules]},
              Select[{rows}, selF @@ # &]
           ]
        ]
    ]
  ]; 

For example, in addition to the previous inputs, we may enter:

select[table,columns["Variable 1"]]
(* {{1},{1},{0}} *)
$\endgroup$
8
  • $\begingroup$ +1 Perhaps you should mention that this version also allows the use of raw column numbers (the Slot part of your code). $\endgroup$ Commented Nov 23, 2011 at 12:28
  • $\begingroup$ @Sjoerd True, I meant this, but then had a second thought that this might not be a good style, so did not mention that. Perhaps better to mention. Will edit. $\endgroup$ Commented Nov 23, 2011 at 12:47
  • $\begingroup$ Perfect. Thank you a lot. $\endgroup$
    – vikkor
    Commented Nov 23, 2011 at 13:34
  • 1
    $\begingroup$ @Viktor Glad I could help. I actually liked these questions because these are good and small examples of putting meta-programming (construction of a function programmatically and at run-time) to do some real non-trivial work that makes our life easier. $\endgroup$ Commented Nov 23, 2011 at 13:38
  • 1
    $\begingroup$ @Viktor Please see my edit $\endgroup$ Commented Nov 26, 2011 at 2:33
9
$\begingroup$

I had to leave just when you posted the question, so I'm a little late, but here's my implementation anyway.

When I read your question, I saw

Cases[table, {id_, var1_, var2_} /; var1 == 1 -> var2]

What you require is not to have to write out the full row pattern, but to be able to refer to row items by names. Here's my implementation for that. It should be easy to understand and modify if you so desire.

SetAttributes[namedCases, HoldRest]

namedCases[{head_, rows___}, condition_, transform_] := 
  With[
  {headPatt = Pattern[#, _] & /@ head, 
   patternNames = Table[Unique[h], {Length[head]}]
  }, 
  SetAttributes[patternNames, Temporary]; (* avoid name pollution *)
  Module[{r = {rows}}, 
    Unevaluated@Cases[r, headPatt /; condition :> transform] /. 
        Thread[head -> patternNames]
  ]
]

Note: I used Module to avoid replacement inside the table (not header).

It was my aim to make this as easily modifyable as possible, if you want to extend this way of working with tables to other operations. You can do a lot by only changing the Cases[] line to something else.

Here I am assuming that your header consists of strings only, not symbols or numbers. Otherwise this function may need to be strengthened to avoid unexpected breakage.

Usage:

namedCases[table, "Variable 1" == 1, 2*"Variable 2"]
$\endgroup$
2
  • $\begingroup$ Very nice - +1. $\endgroup$ Commented Nov 23, 2011 at 15:20
  • $\begingroup$ Very nice indeed. $\endgroup$
    – vikkor
    Commented Nov 23, 2011 at 17:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.