1

I'm trying to add a new endpoint that does full-text search with AND, OR, NOT operators and also tolerates typos with TriagramSimilarity.

I came across this question: Combine trigram with ranked searching in django 1.10 and was trying to use that approach but SearchRank is not behaving as I'd expect, and I'm confused about how it works.

When my code looks like the basic implementation of full-text search the negative filter is working fine

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        qs = Project.objects.annotate(
            search=vector,
        ).filter(
            search=query,
        )
        return Response({
            "results": qs.values()
        })

the returned documents

But I need to implement this using SearchRank so I can later do some logic with the rank score and the similarity score.

This is what my code looks like annotating for rank instead of using the tsvector annotation:

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        rank = SearchRank(vector, query, cover_density=True)

        qs = Project.objects.annotate(
            rank=rank,
        ).order_by("-rank")
        return Response({
            "results": qs.values()
        })

And the response looks like: The documents I got back

The rank given to the document named "APT29 Attack Graph" is 1. I'd expect the - operator would rank it lower, ideally 0.

Does SearchRank not take into consideration any search operators?

This is what the PostgreSQL looks like for the queryset

Sort (cost=37.78..37.93 rows=62 width=655)
  Sort Key: (ts_rank_cd(setweight(to_tsvector(COALESCE(name, ''::text)), 'A'::"char"), websearch_to_tsquery('apt29 -graph'::text))) DESC
  ->  Seq Scan on firedrill_project (cost=0.00..35.93 rows=62 width=655)

Also if there is a better way to do this kind of search without introducing new dependencies (Elasticsearch, haystack, etc) please reference it.

I tried different search operators. Looked for alternative ways to do this, I had no success so far.

2 Answers 2

2

Django SearchRank does not take the search operators into account because it only calculates the rank based on how well the search query matches the documents.

Lets use SearchQuery to filter the results based on the search operators and use TrigramSimilarity to calculate the similarity score.

edit: now we takes into account both full-text search and trigram similarity

from django.contrib.postgres.search import SearchQuery, SearchVector, SearchRank
from django.contrib.postgres.aggregates import StringAgg
from django.contrib.postgres.search import TrigramSimilarity
from django.db.models import F

class ProjectViewSet(viewsets.ModelViewSet):
    queryset = Project.objects.all()
    serializer_class = ProjectSerializer

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")

        projects = Project.objects.annotate(
            rank=SearchRank(vector, query),
            similarity=TrigramSimilarity('name', search_query),
        )

        projects = projects.annotate(
            combined_score=F('rank') * F('similarity'),
        ).order_by('-combined_score')

        return Response({
            "results": projects.values()
        })
5
  • Thanks for the clarification on SearchRank. Unfortunately, I don't the proposed code would work for my use case, because by the moment you are annotating the triagram similarity any typos would have been already filtered out. I'm looking to combinate the full text search features (stemming, stop words) with triagrams. Do you think that is possible? Commented Apr 29, 2023 at 23:00
  • 1
    I'm basically trying to get my result as close as what Elasticsearch would return. We are using ES for the search accuracy on a collection of less than 5k documents. I'm trying to remove that dependency Commented Apr 30, 2023 at 0:13
  • I modified my answer
    – Saxtheowl
    Commented Apr 30, 2023 at 1:03
  • Thanks but now we are back to square one aren't we? I can't do -graph to remove search results since it would not set the rank lower Commented Apr 30, 2023 at 1:19
  • Try to modify the search query to exclude the records that contain the term you'd like to remove
    – Saxtheowl
    Commented Apr 30, 2023 at 1:45
0

I've already used the search rank with trigram similarity in the past.

To filter the search results you can use the "websearch" syntax of the search query matching the search vector.

Finally, you can sort the already filtered search results using a combination (e.g. sum) of the search rank and trigram similarity.

@action(detail=False, methods=["get"])
def search(self, request, *args, **kwargs):
    search_query = request.query_params.get("search")
    vector = SearchVector("name", weight="A")
    query = SearchQuery(search_query, search_type="websearch")
    rank = SearchRank(vector, query, cover_density=True)
    simimlarity = TrigramSimilarity("name", search_query)
    qs = Project.objects.annotate(
        search=vector,
        order=rank + simimlarity,
    ).filter(
        search=query,
    ).order_by("-order")
    return Response({"results": qs.values()})

You can find another example of use in this old article of mine:

Another example of this combination that you might be interested in is the Django site documentation search (which I wrote):

Not the answer you're looking for? Browse other questions tagged or ask your own question.