3
$\begingroup$

(The math problem here just serves as an example, my question is on this type of problems in general).

Given two Schur polynomials, $s_\mu$, $s_\nu$, we know that we can decompose their product into a linear combination of other Schur polynomials.

$$s_\mu s_\nu = \sum_\lambda c_{\mu,\nu}^\lambda s_\lambda$$

and we call $c_{\mu,\nu}^\lambda$ the LR coefficient (always an non-negative integer).

Hence, a natural supervised learning problem is to predict whether the LR coefficient is of a certain value or not given the tuple $<\mu, \nu, \lambda>$. This is not difficult.

My question is: can we either use ML/RL to do anything else other than predicting (in this situation) or extract anything from the prediction result? In other words, a statement like "oh, I am 98% confident that this LR coefficient is 0" does not imply anything mathematically interesting?

$\endgroup$
1
  • $\begingroup$ I've edited this post in order to try to clarify what you're asking. I'm still not sure what you were asking and the given answer below just confirms that your question was a bit open to interpretation, which is bad. Please, now that you probably have a clearer idea of what you had in mind, review this post (including the title that I've added) and try to clarify what you were really asking. What does it mean "do anything else other than predicting"? Are you asking whether an ML model can do something else other than what it was trained to do? $\endgroup$
    – nbro
    Commented Jan 17, 2021 at 16:55

1 Answer 1

3
$\begingroup$

There are quite a few examples of papers where they try and 'teach' neural networks to 'learn' how to solve math problems. Most of the time, sadly, it comes down to training on a large dataset after which the network can 'solve' the sort of basic problems, but is unable to generalize this to larger problems. That is, if you train a neural network to solve addition, it will be inherently constrained by the dataset. It might be able to semi-sufficiently solve addition with 3 or even 4 digits, depending on how big your dataset is, but throw in an addition question containing 2 10 digit numbers, and it will almost always fail.

The latest example that I can remember where they tried this is in the General Language Model GPT-3, which was not made to solve equations per se, but does 'a decent job' on the stuff that was in the dataset. Facebook AI made an 'advanced math solver' with a specific architecture that i have not looked into which might disproof my point, but you can look into that.

In the end, this comes down to 'what is learning' and 'what do you want to accomplish'. Most agree that these network are not able to generalize beyond their datasets. Some might say that not being able to generalize does not mean that it is not learning. It might just be learning slower. I believe that these models are inherently limited to what is presented in the dataset. Given a good dataset, it might be able to generalize to cases 'near and in-between', but I have yet to see a case where this sort of stuff generalizes to cases 'far outside' the dataset.

$\endgroup$
6
  • $\begingroup$ Are all known types of neural networks equal in terms of this? In other words, regardless of their validation accuracy, is there some types of neural network that get closer to conclusive results than others? $\endgroup$
    – SmoothKen
    Commented Dec 18, 2020 at 22:25
  • $\begingroup$ So specific architectures will always outperform more 'general' architectures on specific problems, right. This is also what FacebookAI did with their advanced math solver. Though this will most likely not be the best approach for the specific problem once we get more data and more computational power. For a nice thought experiment, make sure to look up 'The Bitter Lesson' by Rich Sutton who also talks about this. $\endgroup$ Commented Dec 21, 2020 at 9:59
  • $\begingroup$ Crutch of it is: yes, if we put 'our own knowledge' into structures/processes/techniques they will perform better right now, but in 10 years someone will come along and demolish these results with more computational power in a generic structure. $\endgroup$ Commented Dec 21, 2020 at 9:59
  • $\begingroup$ So, not all known types of neural networks perform equal in terms of this, but in my opinion this is not the way you should want to look at this. There is no 'proven' benefit of some specific structures for this kind of learning like CNNs have over ANNs for images. $\endgroup$ Commented Dec 21, 2020 at 10:01
  • $\begingroup$ How then they managed to teach it to program? $\endgroup$
    – Anixx
    Commented Aug 11, 2023 at 10:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .