Post Closed as "Needs details or clarity" by ohwilleke, user35069, Jen, Brian, IKnowNothing

occurred Jul 14, 2023 at 16:49

deleted 2 characters in body

Source Link

edited Jul 6, 2023 at 2:41

121
2

We are trying to use a HuggingFace Embedding model HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal Large Language Model powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

NOTE: This dataset should not be used for any commercial purposes.

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!.

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal Large Language Model powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal Large Language Model powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states:

NOTE: This dataset should not be used for any commercial purposes.

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!.

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

added 17 characters in body

Source Link

edited Jul 5, 2023 at 10:51

cannot_mutably_borrow

121
2

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal LLM-poweredLarge Language Model powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal LLM-powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal Large Language Model powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

Source Link

asked Jul 5, 2023 at 10:34

cannot_mutably_borrow

121
2

Commerical usage of sentence-transformers/multi-qa-mpnet-base-cos-v1

We are trying to use a HuggingFace Embedding model - multi-qa-mpnet-base-cos-v1 for an internal LLM-powered application. While reading the documentation it says (not verbatim):

For the multi-qa mpnet model, of the training datasets used, some of them are not for commercial use.

For instance, GOOAQ and Yahoo! answers.

For GOOAQ, it states: "NOTE: This dataset should not be used for any commercial purposes."

For one of the Yahoo datasets used for multi-qa-mpnet-base-cos-v1, in the readme.txt file, it states:

"The original Yahoo! Answers corpus can be obtained through the Yahoo! Research Alliance Webscope program. The dataset is to be used for approved non-commercial research purposes by recipients who have signed a Data Sharing Agreement with Yahoo!."

multi-qa-mpnet-base-cos-v1 was also trained on MS MARCO which also has the same licence issues.

Does this automatically mean that the model itself is "tainted" and therefore we cannot use it for embeddings?

Stack Exchange Network

Return to Question

Commerical usage of sentence-transformers/multi-qa-mpnet-base-cos-v1