8

I am a PhD student in social science who got interested in programming via data analysis in R and various utility tools in Python, SQL, Java (for web scraping, data querying, etc.).

I am considering whether I should take several undergrad CS classes at my university, including 1) Data Structure & Algorithm, 2) Software Development (both in Java), and 3) Database.

My concern is that I've seen various posters on this site claiming that what's taught in CS program has very little to do with the craft of programming itself. My main goal is to better my data analysis skills (for a data science career) and perhaps learn some machine learning and agent-based modeling. Given that goal, should I those classes? If not, what's the best way to learn programming the applied way?

(A common answer to "how to learn programming by yourself" is to do a project. However, engaging in a project without guidance and quick feedback is a very easy way to get sidetracked, especially since I'm in social science where programming skill is nothing but a tool and not valuable in and of itself. So, if your answer is "to do a project", please provide more details on how to get guidance, e.g. data analysis books)

10
  • 1
    what's taught in CS program has very little to do with the craft of programming itself — This is exactly correct. What is taught in CS programs is more closely related to the underlying science of programming.
    – JeffE
    Commented Aug 10, 2014 at 21:40
  • 5
    In my opinion, the best way to become a better programmer (or a better whatever) is to both learn the science and practice the craft. Neither is sufficient on its own.
    – JeffE
    Commented Aug 10, 2014 at 21:53
  • 2
    Everyone here -- starting with the OP -- is saying something I find strange: that programming skill is "nothing but a tool and not valuable in and of itself". For whom is programming skill anything other than a tool? (For whom is any skill anything other than a tool?) Why is developing a skill / acquiring a tool not "valuable in and of itself"? Most of the point of a PhD is to develop skills for future use: this is more "valuable" than whatever product was created or project completed in the PhD work itself. Commented Aug 11, 2014 at 1:18
  • 1
    Millions of people have developed valuable programming skills: isn't the question rather whether the coursework is a good way of developing these skills? Commented Aug 11, 2014 at 1:18
  • 1
    @Anh: If you are doing programming in your social science work, then having awesome programming skills will improve your work. If awesome programming skills are not standard in your area of social science, then that opens the door even wider for someone with these skills. I also work in a field (pure mathematics) in which programming skills are neither taught nor traditionally valued...and the ones who have them can very often put them to excellent use. One of my PhD students has superior programming skills: I honestly think these will help him get a good academic job. Commented Aug 11, 2014 at 1:51

6 Answers 6

5

If you are considering such classes, then go for it!

You are right that basics of computer science have little to do with direct application (its unlikely that you will need to invent a new sorting algorithm). But:

  • algorithmic thinking is invaluable,
  • getting it at level of computer science students (rather than "computer science for liberal arts majors") may be a challenge worth it,
  • getting contacts and interaction with CS students is invaluable (you may learn a lot about applied programming by immersion),
  • learning advances CS will pay off (regardless if you stay in academia or not).

Dedicating a year or whole studies to something is a commitment people sometimes regret. Taking a one or two courses someone is interested it is rarely regretted.

@EnergyNumbers warns you against lack of focus. But hey, you are not taking a random subject! It is a thing that will boost skills you need for the things you are doing.

Source: I did take one such class (while majoring in phys/math) and it was great (my only regret is I didn't take more). Now I am doing data science.

Disclaimer: I do like to side-track and I have little respect towards entrenching oneself in one, arbitrarily defined field.

4
  • Could you explain with some examples of how algorithmic thinking is "invaluable"? Also, not to be ad hominem, but could you tell me which discipline are you from so I can gauge the level of relevance between CS and yours vs mine?
    – Heisenberg
    Commented Aug 9, 2014 at 15:39
  • @Anh As I said, physics and mathematics (and now I am moving to data analysis... often of social data). But my background is less relevant than quality and level of the course you are considering. Algorithmic thinking - each time you write code to solve a problem it is an algorithm. There are some nice ideas and insights from classical algorithms that may be good for programming in general (e.g. computational complexity, recursion, divide and conquer, dynamic programming, functional programming, encapsulation, dealing with edge cases, proving that the code solves the problem etc). Commented Aug 9, 2014 at 16:11
  • But hey, you are not taking "Arabic Language" — I think this is unnecessarily dismissive of the study of Arabic languages, especially with respect to social sciences (and data science!).
    – JeffE
    Commented Aug 10, 2014 at 21:44
  • @JeffE It is in the context of the question. If the question started with "I am a PhD student in social science who got interested issues of Middle East..." then "Arabic Language" would be certainly on topic. (And I used it to argue with @EnergyNumbers.) In general, I think that it is not bad even if something is very off-topic as long as one is interested in it (to the point that I stared organizing an unconference series, offtopicarium.wikidot.com ;)). In any case, I am changing the problematic sentence to something less inflammatory. Commented Aug 10, 2014 at 22:34
4

I see a couple of places where CS classes would help towards data analysis in the social sciences.

  • thinking about issues like complexity can help when you're writing scripts to analyze large amounts of data

  • the "craft of programming" is also useful to learn, because you realize the risks of your code being wrong or faulty in some way (and techniques to guard against that), which in a social science study can simply lead you to draw completely wrong conclusions. Such skills are likely to be taught rather in computer/software engineering than CS, if that distinction exists wherever you are.

One of my friends, a glaciologist, lost an enormous amount of time during his phd because of limited programming skills and the inability to verify his code, which led him to doubts about his experimental results. The programmer they hired just didn't understand the science and was useless...

Finally about machine learning and agent-based modeling, you won't learn much (if any) of that in undergrad courses. Those are usually graduate courses.

Perhaps an interesting avenue would be to collaborate with a computer scientist on a research project related to your studies, rather than embarking solo on a toy programming project.

1
  • Strongly seconding the second bullet point here -- understanding what your code actually shows is really hard. Actual math-research hard, which is what CS departments often do. Getting a good-enough-to-launch-a-rocket trust in your code is the engineering approach. Look for classes in testing and verification or validation. N.b.; I know of a hotshot professor whose first big paper was demonstrating that famous modeled result "A -> B" was meaningless because all input to the program in question produced result B.
    – cphlewis
    Commented Apr 16, 2015 at 18:03
3

No, you shouldn't take undergraduate courses just because the subject interests you.

You are doing a PhD in Social Sciences. That means that all of your academic effort goes into your PhD. A PhD is a full-time occupation. Not in some weak, 35-hours-a-week sense of "full-time", but in the sense of consuming as much brain work as you are able to put in.

Programming and statistics are merely some of the means to an end for you: to provide you with the tools you need to complete your PhD. So focus. Focus on that. Every time you feel the urge, when feel distracted by machine learning or whatever, ask yourself: how does this directly contribute to me finishing my PhD? Catch yourself as soon as the siren call of "it will help me better understand the bigger picture of machine learning" or whatever starts weaseling its way into your mind. Those are nasty little tricks the mind plays on itself, that will drag you off your true course, and onto the rocks of endless distraction. And then you'll never finish your PhD. Just keep looking at the post-it-note you've attached to the side of your monitor that contains a terse expression of your PhD's research question. That is your beacon: aim at it relentlessly, and don't steer away from your true course.

1
  • 6
    Strongly disagree. The PhD is not the goal of a PhD program. The goal of a PhD program is to become a productive independent researcher. A PhD is not something you finish; it's merely the first step (or one of the first steps) of a research career. Focusing too narrowly on your thesis is just as dangerous as not focusing at all.
    – JeffE
    Commented Aug 10, 2014 at 21:47
3

I recommend learning about those topics. While EnergyNumbers is correct that programming is a means to an end for you, I strongly disagree with the implication that becoming a better programmer is a waste of time. Knowing more about computer science will let you complete your analyses faster and more easily and, more importantly, will give you more confidence in your code and the results you produce. A real life example of social science programming gone wrong: the coding errors in the Reinhart-Rogoff spreadsheet.

I also feel this way because I'm in the opposite of your situation. I'm a software engineer, but I'm interested how developers work with their tools and each other, so there's a social science flavour to my research. I've found it extremely helpful to dip into social science textbooks and courses to learn more about doing this kind of research. If I hadn't taken the time to learn more about grounded theory, conducting interviews, designing surveys, and so on, I'm sure my work would have serious methodological problems.

That said, I don't think you necessarily have to take a course to learn the material, or at least not an on-campus course. I took Coursera courses on algorithms and machine learning for my own interest and can strongly recommend both of them. EdX and Udacity are two other MOOC providers with good offerings. Doing online work is more flexible than attending undergraduate lectures, and you can pick and choose the material that's most relevant to you. Reading textbooks on your own time is an even more flexible approach. On the other hand, if the structure of an on-campus course helps you learn, then by all means, take the course.

1

Most of the time, in practical programming you can get away knowing the craft without really needing the science, except for a few details (why is it faster to iterate the matrix by rows than by columns? why can I not run this in parallel?). But a stronger background on the science can help you expand your limits. For example, imagine you have a "weird" dataset, where the standard algorithms don't work, but maybe you can think of your own!

Also, you will be able to understand papers from other disciplines. I personally have "leeched" knowledge from papers in Sociology, Electrical Engineering, Medicine, Mathematics... all of them oriented at the processing of their data, that was by chance, somehow similar to mine.

Yet another reason for it is that it can boost your academic value. I don't think there are many people with a CS background in Social Sciences, and probably even less with a strong knowledge of the science behind your research, which means it can help you stick out and be a valuable asset for your lab.

If this was not enough, never underestimate the value of networking. At some point, you may hit a problem that is too hard for your programming skills. Having connections in CS, you can propose a collaboration.

3
  • 2
    +1 for the third paragraph, especially as a counterweight to the weird idea that doing a PhD in X means pointedly refraining from thinking about related subject Y. In the contemporary market we want to hire people who stand out in some way. Of course being simply the world's best at subject X is a good way to get hired in Dep't X: everyone is trying to do that. But being as good at X as the other A-list candidates and also having skills in Y is a really good way to get some jobs in the current market. (I don't think that most social scientists work in "labs"; no biggie.) Commented Aug 11, 2014 at 1:26
  • @PeteL.Clark regarding labs, I think it is common terminology for research groups in many fields (not all), regardless of the actual "labness". I vaguely remember a TED talk by a social scientist saying "we brought people to our lab".
    – Davidmh
    Commented Aug 11, 2014 at 1:38
  • 1
    In my field (pure mathematics), "lab" means only the room where you have to go if you don't have a computer in your office. For that matter, mathematics departments are usually not organized into research groups in the way I think you mean either. Anyway, "social sciences" is immensely broad: psychologists certainly have labs, but their labs probably have one-way mirrors. Sometimes I get a little sensitive to the fact that people on this site tend to use scientific terminology as a synecdoche for academic terminology. But it's not a big deal. Commented Aug 11, 2014 at 2:01
0

The three courses you mentioned deserve discrimination: one yes, one maybe, and one not yet.

Do take Data Structures and Algorithms. It covers the principal means of practically addressing real classes of problems. Even if you never use most of the techniques you see there, getting a sense of the scope of approaches to deterministic problems is key to designing new programs. If you're going to do some programming you want to know that this stuff exists.

You may choose to take Software Development. I'm assuming it's like Software Engineering, which is actually about the craft: team formation and communication strategies, productivity tools used by professional programmers, optimization for quality and maintainability. These practices are niceties essential to working in large groups or on long-lived projects, frequently ignored by solo programmers.

Don't rush to study relational Databases. They're just a famous case of a matched query language and data structure (it's an arbitrary number of mixed-type 2D matrices). You don't need to study them unless you're going to use them.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .