This is an interesting question. I can find no case law on point, and the answer seems to depend on the jurisdiction involved.
The CC-BY-NC 4.0 license grants, in section 2 paragraph a.1.A, the right to
... reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only;
and in section 2 paragraph a.1.A, the right to:
produce, reproduce, and Share Adapted Material for NonCommercial purposes only.
In addition, under Section 4 paragraph a, in those countries that recognize "Sui Generis Database Rights" the right granted under 2.a.1 also include the rights to:
... extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only;
These are the "Licensed Rights" under section 1 paragraph g, and the license restrictions (including the restriction to noncommercial use) apply only to the Licensed Rights.
Note that the Sui Generis Database Rights are recognized only in the EU, and in particular are not recognized under US copyright law. Such rights were rejected under the Supreme Court decision Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340 (1991)
According to the US Copyright Office's "Statement of David O. Carson, General Counsel, United States Copyright Office
before the Subcommittee on Courts, the Internet, and Intellectual Property Committee on the Judiciary, 2003"
What remains is a thin layer of copyright protection for qualifying databases. In order to qualify, they must exhibit some modicum of creativity in the selection, arrangement, or coordination of the data. The protection is thin in that only the creative elements (selection, arrangement, or coordination of data) are protected by copyright. Explanatory materials such as introductions or footnotes to databases may also be copyrightable. But in no case is the data itself (as distinguished from its selection, coordination or arrangement) copyrightable.
If I have understood the question correctly, the data used to train the model is "used", but it is not "reproduced" or "shared", as the question says:
The original data cannot be reproduced ... The data is simply used for training purposes.
Thus, in the US, it seems that the non-commercial restriction would not apply to this training data that is used but not reproduced or shared. However, in the EU the non-commercial restriction would apply, and selling the model created in part by the use of the data, without an additional grant of permission would seem to violate the license and thus infringe the copyright-holder's rights.
The CC Data FAQ says:
Under version 4.0, if an NC license has been applied then any use of the licensed database or its contents that is restricted by copyright law or sui generis database rights requires compliance with the NC term, even if the database is not publicly shared.
(emphasis added)
This seems to confirm the conclusion that protection of the data against commercial use by the CC-BY-NC 4.0 license is afforded only if the copyright holder is in a jurisdiction that grants the sui generis database rights, that is, in the EU.