-16

Older title:

Wording of "time-series" is understood as trend analysis although just a mix of some monthly/weekly/yearly/timeless columns as input is meant

A question of how to build an ML model from columns that are stored monthly in a database and other columns that are overwritten or timeless should be clearly needed on Stack Exchange if you set up the features of an ML model.

I asked this at Does the training+testing set have to be different from the predicting set (so that you need to apply a time-shift to ALL columns)? (no time-series!) [closed].

Attempts to get the votes to reopen this question might fail since most readers seem to misunderstand the question as asking for a time series analysis that predicts a trend like a classical time series. But this question is not about a time series that you would take to make a seasonal analysis or trend analysis for its decomposition. This is just about columns of some x months back, and columns of some y months back, and some columns of a database that do not have a history at all, that you altogether treat as normal features - as if you did not even know that they were monthly columns. And even then, you will get good predictions. I know this since I ran the model myself, so if anyone now wants to tell me that I have to take a time series analysis instead, they are wrong, proven by example.

Here is what I wrote against the downvotes (-1 at the time of writing):

Since I know from the practical model in the end that this misunderstanding is misleading, and that the model worked, I now try to convince readers to reopen this.

Since this got downvoted and since the first answer seems to see this as an ML time series question: This question is not about a classical time series analysis, but seeks to deal with monthly columns as features. There is no aim to check for a trend or any other decomposition! I have shared this question since I had exactly this challenge at work, and in the end, the model worked fine with this setup, mixing up non-monthly (timeless) features with monthly features. Therefore, this is just a question of using monthly data columns as features, and the model does not care about whether it is December or June, it only cares about how many months these features lie in the past, so that it learns from the pattern of data over the time of some x months before. The features are not called after the months, but just after how many months they lie back in time, like wealth_month_1, wealth_month_2 for the wealth of 1 or 2 months back in time.

But it got closed since it is said not to be a programming question. Then I ask the reader: how would you set up a model without knowing how to treat the monthly saved columns of your database as features?

Example:

Not every dataset covers just the height and colour of a plant. Some might want to take the height and colour three months after it being planted as further input features, some might want to take the height and colour one month before the final measurement as further input features.

Even if this question is so general, it needs to be answered so that you can program a (non-?)time series model on attributes that may or may not change over time. I lack the wording to define this, I guess you would still call this a time series, even if I now stress in the title: "(no time-series!)" and make this only clearer in the body.

I get remarks that even my own answer shows a picture of a "time series". Yes, this answer shows the monthly shift in the dataset that you need if you shift the training of the model by a month. But that does not mean that the whole dataset needs to be treated like a "time series": it does not mean to ask for any decomposition like seasonal or trend analysis. For example, I could also decide to take only some data column six months before, two months before, and that of the last month, and the model would still learn from these input columns as features. Such a model does not care about a gapless time series dataset, and it is not about it anyway, it is about checking for patterns in the feature input that lead to a given output for training. Until I predict this output in the end.

Working model, needed general question without any code, proven by work experience, still closed.

How do I get it reopened?

Should it be on Cross Validated SE instead?

Any other hints?

The word "time series" seems to be misunderstood since most readers and voters seem to misunderstand it as a classical ML time series question of a dataset with gapless columns over time even though it is just about monthly saved columns of the database and other timeless columns as one basket of features. What is a better word for this?

18
  • 8
    Why do you assume that the question is closed because readers misunderstand your time-series-thingy? Commented Dec 5, 2023 at 16:43
  • @samcarter_is_at_topanswers.xyz One of the answerers misunderstood this. I take this as a mirror for many more. This question is about how to set up the model's many many features that can be spread over time in many ways, up to being timeless as well. You need to know an answer to this if you want to begin with the model at all, which is a programming question without code. Commented Dec 5, 2023 at 16:48
  • 12
    Your meta question would probably be much better received if you'd ask why your question was closed without making assumptions. Commented Dec 5, 2023 at 16:52
  • 8
    On a side note (adding to what's been said about making too many assumptions about what other people know or understand), you really need to stop complaining about the downvotes and asking downvoters to leave feedback. It is not constructive. Voting is anonymous and people don't have to explain their downvotes for well established reasons. meta.stackoverflow.com/questions/357436/…
    – E_net4
    Commented Dec 5, 2023 at 17:09
  • 9
    I don't get your reasoning why the need to understand this makes it programming related. As far as I can tell, you argue that model building is programming related because it is needed to write a program based on such a model. The aged coffee machine comparison comes to mind: Just because programming a coffee machine software requires domain knowledge of coffee brewing, this does not make coffee brewing itself programming related. Commented Dec 5, 2023 at 17:11
  • 3
    @questionto42 - As long commentary remains in the question body, it will remain in a state, which I cannot justify voting to reopen the question. The question is answered, and you accepted your own question, so what does the community gain by allowing the question to receive additional answers in it's current state? Commented Dec 5, 2023 at 17:14
  • @MisterMiyagi Saying that this is a sort of programming from scratch is just my own experience. I shared this question in a team, nobody could answer it. The answer to this question did not help us. I needed to get to know this, and I could not find this anywhere on the net. Examples only showed features that did not change over time. Any other examples used a dataset that was only about changes over time. I then just ran the model that mixed both kinds and found that it worked, but I could not find a good answer to this for a longer time and found it later over the years by mere chance. Commented Dec 5, 2023 at 17:18
  • @SecurityHound A closed question will be deleted after some time. That is why I try to get it reopened. Other answers are welcome. I only answered this after quite some effort and by mere chance. Since I am not a professional, I might make a mistake here. I have asked to put it on Data Science SE now. I changed the comment at the beginning a bit now. Commented Dec 5, 2023 at 17:22
  • 3
    @questionto42 - A closed question with 2 answers? I don't believe that is correct. If you want the question to be migrated then you still have to remove the commentary within the body of the question (IMO). Commented Dec 5, 2023 at 17:36
  • @SecurityHound If you mean the commentary at the beginning, I now made it a TLDR instead, without any meta noise. That should also help understanding the question. If a question with two answers does not get closed so far, I am fine with this. Did not know that. Commented Dec 5, 2023 at 17:39
  • 6
    "A closed question will be deleted after some time." - yes, if you don't fix it such that it meets standards, so that people are convinced and cast reopen votes. If you think the question is a programming question, it is your responsibility to a) read the guidelines in order to understand what we consider on topic; b) make sure you can make an argument that the question is on topic according to those standards; c) edit the question so that others can clearly see it's on topic, without arguing or using meta commentary in the edit. Commented Dec 5, 2023 at 23:29
  • 3
    On SE/SO sites posts have titles. Other things in other contexts might have headers, it is irrelevant.
    – philipxy
    Commented Dec 6, 2023 at 11:10
  • "A question of how to build an ML model from columns that are stored monthly in a database and other columns that are overwritten or timeless should be clearly needed on Stack Exchange if you set up the features of an ML model." - maybe so. Asking difficult questions is a major responsibility though. Maybe today you are ready to take on that responsibility, but 2019-you kind of sabotaged your efforts by creating a question which was not ready. So can 2023-you rewrite it?
    – Gimby
    Commented Dec 6, 2023 at 12:45
  • 3
    It's a title not a header. I'm done.
    – philipxy
    Commented Dec 6, 2023 at 14:29
  • 3
    Re "avoid Latin- or Greek-rooted words": That is a noble goal, but it is much more important to avoid: 1) Asking questions in broken English. 2) Run-on sentences. 3) Sentence fragments. Commented Dec 6, 2023 at 17:08

1 Answer 1

18

Here is a quote from the question that helped set its scope:

But now comes the question: When I have an already trained and tested classifier ready, can I apply it to the same dataset that was the base of the training and testing set? Or do I have to apply it to a new predicting set that is different from the training+testing set?

This question is not about a classical time series analysis, but seeks to deal with monthly columns as features. [...] so that it learns from the pattern of data over the time of some x months before.

The way I see it, and why I voted to close the question, is that it is still about methodologies in machine learning. It does not matter that we are not doing any trend analysis, or building a regression or classification model. The reasoning required to take time series data and feed it to whichever model in a way which does not lead to incoherent or biased outcomes is still a question more suitable for the data science domain. While not objective evidence, there were also some other cues which further reinforce this idea, namely that there is no code involved and no objective problem statement (as in "given this data, obtain this precise sequence of features", instead of "how should I process these features"). The answers also show no code, but instead describe some key ideas and steps to be done by data scientists in broad terms.

As such, this is not a case of everyone misunderstanding the question. Rather, we witnessed at least one part of the community determining that this question in particular deviates too much from the scope of programming. You may disagree, but unless you or someone else comes up with a better argument, it should stay closed. From what I know about Data Science SE, I would say that it would be on-topic there.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .