6

Let us say that I want to search (on some Stack Exchange site) for posts linking to some website, and the URL contains the tilde ~. For the sake of example, let us try python.net/~goodger. (I basically took some random site of this form that appears on SO.)

I have already realized that in addition to searching for url:"*python.net/~goodger*" (Stack Overflow, the whole network) I need to try url:"*python.net/%7Egoodger*", too (Stack Overflow, Stack Exchange). Even for the posts from the second search, in the source I see python.net/~goodger. But in SEDE, it seems that Posts.Body actually contains %7E. Here is a query on GIS. (I chose a smaller site - I expect that on SO such a query will time out.)

Are there some other variations I have to try to make sure that I find all such posts? Is this expected behavior, or should this be considered a bug?

(I stumbled upon this after some bulk replacements that were done on some sites recently - such as Physics and Mathematics. But I suppose that sometimes people might use search of this type for entirely different reasons.)

3
  • Looks like something changed in the renderer (I mean the part that converts Markdown, i.e. PostHistory.Text into HTML, i.e. Posts.Body) around 2020. I encountered similar peculiarities while repairing broken links, and I think I still account for it in some of the unit tests in that project. Of course, both styles are equally valid, though it's not necessary to use the percent-escape.
    – Glorfindel Mod
    Commented Jun 29, 2022 at 17:00
  • I've given this several tries (long ago) and without digging much further the best chance is using partial/exact search in google per domain. That gives the most hits but the results aren't consistent. There's another problem, SE does some trimming in search (I have the relevant links bookmarked but don't ask me to dig them up) so results like liberte or liberté will coincide and only google may, or may not, keep them separate. But if you use search in a page your browser also conflates the results. Besides, partial vs full text search is a yet unsolved problem.
    – bad_coder
    Commented Jun 29, 2022 at 23:19
  • Now, you best chance is likely SEDE but even then I don't know how far SQL Server supports partial text... If you don't have an exact string that doesn't need chopping or edit distance you might be in trouble.
    – bad_coder
    Commented Jun 29, 2022 at 23:21

1 Answer 1

5
+50

I don't have a perfect answer, but I do have an answer that often works. Just split up the URL:

url:python.net url:goodger

This could create false positives. How likely that is depends on the exact URL, but it's probably fewer false positives than you would expect. There are 0 false positives in the link above, for example.

Note: you don't need asterisks or quotes when searching URLs. They don't do anything. Just leave them out.

3
  • I get 156 results among them this one that doesn't have the ~ tilde.
    – bad_coder
    Commented Jun 29, 2022 at 23:07
  • 1
    @bad_coder 156 is the total you get when you add the result count from the two searches in the question together. For that answer, what about the third and final link? web.archive.org/web/20180411011411/http://python.net/~goodger/…
    – Laurel
    Commented Jun 29, 2022 at 23:10
  • That one does have the tilde ~ in the title, good catch! Can you edit the post and include the link?
    – bad_coder
    Commented Jun 29, 2022 at 23:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .