81

I have a snippet of code using the pygoogle python module that allows me to programmatically search for some term in google succintly:

 g = pygoogle(search_term)
 g.pages = 1
 results = g.get_urls()[0:10]

I just found out that this has been discontinued unfortunately and replaced by something called the google custom search. I looked at the other related questions on SO but didn't find anything I could use. I have two questions:

1) Does google custom search allow me to do exactly what I am doing in the three lines above?

2) If yes - where can I find example code to do exactly what I am doing above? If no then what is the alternative to do what I did using pygoogle?

1
  • It looks as if Custom Search returns cached results which are out of sync with latest Google finds?
    – Alex
    Commented Jan 11, 2017 at 18:47

3 Answers 3

139

It is possible to do this. The setup is... not very straightforward, but the end result is that you can search the entire web from python with few lines of code.

There are 3 main steps in total.

1st step: get Google API key

The pygoogle's page states:

Unfortunately, Google no longer supports the SOAP API for search, nor do they provide new license keys. In a nutshell, PyGoogle is pretty much dead at this point.

You can use their AJAX API instead. Take a look here for sample code: http://dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/

... but you actually can't use AJAX API either. You have to get a Google API key. https://developers.google.com/api-client-library/python/guide/aaa_apikeys For simple experimental use I suggest "server key".

2nd step: setup Custom Search Engine so that you can search the entire web

Indeed, the old API is not available. The best new API that is available is Custom Search. It seems to support only searching within specific domains, however, after following this SO answer you can search the whole web:

  1. From the Google Custom Search homepage ( http://www.google.com/cse/ ), click Create a Custom Search Engine.
  2. Type a name and description for your search engine.
  3. Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past this screen. More on this later ).
  4. Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
  5. Click any of the links under the Next steps section to navigate to your Control panel.
  6. In the left-hand menu, under Control Panel, click Basics.
  7. In the Search Preferences section, select Search the entire web but emphasize included sites.
  8. Click Save Changes.
  9. In the left-hand menu, under Control Panel, click Sites.
  10. Delete the site you entered during the initial setup process.

This approach is also recommended by Google: https://support.google.com/customsearch/answer/2631040

3rd step: install Google API client for Python

pip install google-api-python-client, more info here:

4th step (bonus): do the search

So, after setting this up, you can follow the code samples from few places:

and end up with this:

from googleapiclient.discovery import build
import pprint

my_api_key = "Google API key"
my_cse_id = "Custom Search Engine ID"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search(
    'stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
    pprint.pprint(result)

After some tweaking you could write some functions that behave exactly like your snippet, but I'll skip this step here.

9
  • 5
    my_cse_id can be found from https://cse.google.com/cse/setup/basic?cx=<my_cse_id> and replace %3A with :
    – Hugo
    Commented Feb 18, 2017 at 6:56
  • Custom Search API v1 is long deprecated (but still works). Any idea how to do a search with API v2? Just replacing "v1" by "v2" makes that line crash (UnknownApiNameOrVersion).
    – mimo
    Commented Jan 21, 2018 at 6:47
  • @mimo Deprecated, but... Works :) Good find, I didn't know that. Do you have any data on when it is expected to be decommissioned? I guess I'll have to adapt my code before that date.
    – mbdevpl
    Commented Jan 24, 2018 at 12:00
  • @mbdevpl All I have is an email from Google dated March, 2017, telling me that API 1.0 was deprecated in 2012, is not maintained anymore and "may experience outages and failures, and possibly stop working entirely".
    – mimo
    Commented Jan 26, 2018 at 10:02
  • @mbdevpl works perfectly, thanks a lot :) great post. Do you know if there is any way to monitor usage for this and make sure I am within the free limit and have you worked out how to use the non deprecated library? Cheers Commented Mar 31, 2018 at 15:46
26

@mbdevpl's response helped me a lot, so all credit goes to them. But there have been a few changes in the UI, so here is an update:

A. Install google-api-python-client

  1. If you don't already have a Google account, sign up.
  2. If you have never created a Google APIs Console project, read the Managing Projects page and create a project in the Google API Console.
  3. Install the library.

B. To create an API key:

  1. Navigate to the APIs & Services→Credentials panel in Cloud Console.
  2. Select Create credentials, then select API key from the drop-down menu.
  3. The API key created dialog box displays your newly created key.
  4. You now have an API_KEY

C. Setup Custom Search Engine so you can search the entire web

  1. Create a custom search engine in this link.
  2. In Sites to search, add any valid URL (i.e. www.stackoverflow.com).
  3. That’s all you have to fill up, the rest doesn’t matter. In the left-side menu, click Edit search engine{your search engine name}Setup
  4. Set Search the entire web to ON.
  5. Remove the URL you added from the list of Sites to search.
  6. Under Search engine ID you’ll find the search-engine-ID.

Search example

from googleapiclient.discovery import build

my_api_key = "AIbaSyAEY6egFSPeadgK7oS/54iQ_ejl24s4Ggc" #The API_KEY you acquired
my_cse_id = "012345678910111213141:abcdef10g2h" #The search-engine-ID you created


def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']


results = google_search('"god is a woman" "thank you next" "7 rings"', my_api_key, my_cse_id, num=10)
for result in results:
    print(result)

Important! on the first run, you might have to enable the API in your account. The error message should contain the link to enable the API in. It will be something like: https://console.developers.google.com/apis/api/customsearch.googleapis.com/overview?project={your project name}.

You’ll be asked to create a service name (It doesn’t matter what it is), and give it Roles. I gave it Role Viewer and Service Usage Admin and it works.

5
  • This should be the new accepted answer. Thank you Commented Apr 10, 2021 at 12:44
  • The python library that is linked to here states: "This library is officially supported by Google. However, the maintainers of this repository recommend using Cloud Client Libraries for Python, where possible, for new code development. For more information, please visit Client Libraries Explained." Commented Jun 8, 2022 at 17:13
  • Anyone has problem with num greater then 10? I couldn't even use num=11. Got an error: "googleapiclient.errors.HttpError: <HttpError 400 when requesting .... returned "Request contains an invalid argument.". Details: "[{'message': 'Request contains an invalid argument.', 'domain': 'global', 'reason': 'badRequest'}]"
    – hln
    Commented Nov 20, 2022 at 17:52
  • When trying to activate the browser logs in with the wrong account and tells me I have no access, I it is not possible to change the account. is there another way to activate the engine, without using the link?
    – Soerendip
    Commented Jul 26, 2023 at 17:33
  • Hi, I followed your answer, and everything worked except it gave me an HTTP Error which was fixed by going to developers.google.com/custom-search/v1/introduction and clicking on the Create API Button and select the same Project as the one you did in the above answer, and it's working now. Hopefully that helps as of Sep 4, 2023. Commented Sep 3, 2023 at 22:14
10

Answer from 2020

Google aren't providing any API anymore for some reason, but https://github.com/bisoncorps/search-engine-parser is developing a python package for scraping Google.

Installation

pip install search-engine-parser

Usage

from search_engine_parser import GoogleSearch

def google(query):
    search_args = (query, 1)
    gsearch = GoogleSearch()
    gresults = gsearch.search(*search_args)
    return gresults['links']

google('Is it illegal to scrape google results')

I don't know how legal this is, but as long as you aren't commercializing your product I think you can get away with it. Besides Google haven't really sued anyone because of using their product, they have just banned their IP address.
For more information Is it ok to scrape data from Google results?

3
  • 1
    >"Google aren't providing any API anymore for some reason" --Some link to these news? Because you are wrong.
    – Rutrus
    Commented Mar 8, 2021 at 19:39
  • @Rutrus, How about you show me where to access that search API? Commented Apr 10, 2021 at 19:25
  • Search available APIs
    – Rutrus
    Commented Apr 11, 2021 at 22:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.