Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load translation_of into solr work title + allow work.* field in search #8770

Merged

Conversation

benbdeitch
Copy link
Collaborator

@benbdeitch benbdeitch commented Jan 29, 2024

Closes: #8700

This PR enables Solr support for searching by 'work.title', or other related fields, in addition to searching for edition titles.

Technical

This PR the Solr indexes of works to include all 'translated_from' fields of their editions under 'alternative titles', as well as including the 'translation_of' property in the SolrEditionBuilder class. In addition, it currently alters the 'is_search_field' function to remove 'work.' prefixes from strings, before checking them against the all_fields list.

Once the query passes as a valid field, it is ignored from conversion to an edition query, and added back on to the query afterwards, with 'title' removed, in order to achieve proper behavior.

Testing

Simply search for queries involving 'work.:' on the search bar. This should show all editions that are translations of the work.

Screenshot

It currently does not behave correctly.

Stakeholders

@cdrini

@mekarpeles mekarpeles marked this pull request as draft January 29, 2024 20:39
@benbdeitch benbdeitch changed the title DRAFT PR: 8700/feature/search for translation Feb 2, 2024
@benbdeitch benbdeitch marked this pull request as ready for review February 2, 2024 16:36
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good thank you @benbdeitch ! A few fixes to put in place :)

openlibrary/plugins/worksearch/schemes/works.py Outdated Show resolved Hide resolved
openlibrary/plugins/worksearch/schemes/works.py Outdated Show resolved Hide resolved
openlibrary/solr/updater/edition.py Outdated Show resolved Hide resolved
openlibrary/solr/updater/work.py Outdated Show resolved Hide resolved
openlibrary/plugins/worksearch/schemes/works.py Outdated Show resolved Hide resolved
@cdrini cdrini added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Feb 7, 2024
@benbdeitch
Copy link
Collaborator Author

I fixed most of the issues; I'll get to the last one tomorrow. Thanks for catching them!

Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this all looks correct! One suggestion. Putting on testing.openlibrary.org now to test it out!

Comment on lines +98 to +99
assert fn('work.title:Bob') == 'title:Bob'
assert fn('title:Joe') == 'title:Joe'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use some complex queries here? These tests would also pass this function ;)

def luqum_replace_field(query, replacer):
    return replacer(query)
@cdrini cdrini changed the title 8700/feature/search for translation Feb 12, 2024
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests:

All good! I won't be able to test the solr updater on testing.openlibrary.org , so will wait for that to be on prod. Either way it'll require a full reindex to affect the full catalog, although it will work on edits as they happen!

Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Will merge after that last final code review suggestion is fixed up 😊

@cdrini cdrini merged commit 7232128 into internetarchive:master Feb 16, 2024
3 checks passed
Achorn pushed a commit to Achorn/openlibrary that referenced this pull request Mar 14, 2024
…ch (internetarchive#8770)

* Solr Updater now keeps track of all 'translation_of' fields of the editions of a work in alternative titles.

* Some logging for development, and registering 'work.' fields as valid.

* Fixed typo in 'startswith' method

* Prevented convert work_field to edition_field from processing fields prefaced with 'work.'

* Revert "Prevented convert work_field to edition_field from processing fields prefaced with 'work.'", as it does not follow the intended solution.

This reverts commit 1bc82d9.

* Marked additional logging messages with [TODO]. so that they can easily be located and deleted for the final version.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed variable name, prevented work.title from being passed to editions, and removed prefix from work.title in solr params.

* Fixed the remove_work_prefix_from_query function to properly crawl across the entire list of parameters, rather than just the first index.

* Removed the logging added for testing purposes.

* Removed unnecessary set comprehension

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding 'translation of' property to Solr edition objects, to eliminate bug.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed 'book in' for 'book.translation_of', to prevent attempted iteration over a non-iterable object.

* Fixed set() declared as {}.  The empty set and the empty dict are not interchangeable in Python

* misc fixes, as suggested in PR review

* Other misc fixes, as suggested in PR review.

* removed unnecessary try/except clause.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added support for new 'luqum_replace_fields' futil, and fixed incorrect traversal of the luqum tree when removing the 'work.' prefix.

* Fixed whitespace errors

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added unit test for luqum_replace_fields function.

* Added newline to end of file

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed testing

* Removed unneeded argument.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added additional testing

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@cdrini cdrini removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants