106

I am in a situation where new content is created when I scroll down. The new content has a specific class name.

How can I keep scrolling down until all the elements have loaded?

In other words, I want to reach the stage where if I keep scrolling down, nothing new will load.

I was using code to scroll down, coupled with an

await page.waitForSelector('.class_name');

The problem with this approach is that after all the elements have loaded, the code keeps on scrolling down, no new elements are created and eventually I get a timeout error.

This is the code:

await page.evaluate( () => {
  window.scrollBy(0, window.innerHeight);
});
await page.waitForSelector('.class_name');
4
  • 1
    It sounds like there might be an issue with the code you use to scroll down. Can you please add that to your question? Commented Jul 26, 2018 at 3:23
  • if i keep scrolling down, nothing new will load Define "nothing new will load" and check for that in your code. Also timeouts can be redefined. But yes, Grant Miller is right, please provide your code and, ideally, thet target site URL.
    – Vaviloff
    Commented Jul 26, 2018 at 8:28
  • Thanks a lot! I upadated the code. Since it is a local site, i cannot post a URL though... 'Nothing new will load' means the website has loaded all the available elements, and so, when i keep scrolling down and using page.waitForSelector(), no new elements will appear, and my code waits indefinetely, until it throws a timeout error. Commented Jul 26, 2018 at 9:57
  • 5
    you could try this await page.evaluate('window.scrollTo(0, document.body.scrollHeight)') Commented Oct 16, 2018 at 17:53

14 Answers 14

177

Give this a shot:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.goto('https://www.yoursite.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page);

    await page.screenshot({
        path: 'yoursite.png',
        fullPage: true
    });

    await browser.close();
})();

async function autoScroll(page){
    await page.evaluate(async () => {
        await new Promise((resolve) => {
            var totalHeight = 0;
            var distance = 100;
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;

                if(totalHeight >= scrollHeight - window.innerHeight){
                    clearInterval(timer);
                    resolve();
                }
            }, 100);
        });
    });
}

Source: https://github.com/chenxiaochun/blog/issues/38

EDIT

added window.innerHeight to the calculation because the available scrolling distance is body height minus viewport height, not the entire body height.

EDIT 2

Sure, Dan (from comments) In order to add a counter to stop the scrolling you will need to introduce a variable that gets incremented with each iteration. When it reaches a certain value (say 50 scrolls for example), you clear the interval and resolve the promise.

Here's themodified code with a scrolling limit set to 50:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.goto('https://www.yoursite.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page, 50);  // set limit to 50 scrolls

    await page.screenshot({
        path: 'yoursite.png',
        fullPage: true
    });

    await browser.close();
})();

async function autoScroll(page, maxScrolls){
    await page.evaluate(async (maxScrolls) => {
        await new Promise((resolve) => {
            var totalHeight = 0;
            var distance = 100;
            var scrolls = 0;  // scrolls counter
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;
                scrolls++;  // increment counter

                // stop scrolling if reached the end or the maximum number of scrolls
                if(totalHeight >= scrollHeight - window.innerHeight || scrolls >= maxScrolls){
                    clearInterval(timer);
                    resolve();
                }
            }, 100);
        });
    }, maxScrolls);  // pass maxScrolls to the function
}

15
  • 7
    100); is too fast, it would just skip the whole autoscrolling , i had to use 400... is there anyway to detect an class, element appearing before stopping the autoscroll?
    – CodeGuru
    Commented Jan 12, 2019 at 15:56
  • 1
    When you're evaluateing you have a reference to the document context. So you would just use a standard selector, and check it's position using getBoundingClientRect.
    – Cory
    Commented Jan 14, 2019 at 22:02
  • 1
    lqbal: It could be related to your xvfb. Try changing headless: false to headless: true
    – Cory
    Commented Oct 3, 2019 at 0:53
  • 1
    @JannisIoannou, take a look at this MDN. window is a global browser object, representing the window in which the script is running. If you're referencing window in Node, you'll get an error.
    – Cory
    Commented Apr 6, 2021 at 17:27
  • 2
    @JannisIoannou: To execute JavaScript code on your puppeteer instance, you use the evaluate method. Think of code running inside evaluate as if you are running it in a browser console. In this case window is automatically created when evaluate is called. Please take a look at the evaluate method for additional context.
    – Cory
    Commented Apr 22, 2021 at 17:52
46

Scrolling down to the bottom of the page can be accomplished in 2 ways:

  1. use scrollIntoView (to scroll to the part of the page that can create more content at the bottom) and selectors (i.e., document.querySelectorAll('.class_name').length to check whether more content has been generated)
  2. use scrollBy (to incrementally scroll down the page) and either setTimeout or setInterval (to incrementally check whether we are at the bottom of the page)

Here is an implementation using scrollIntoView and selector (assuming .class_name is the selector that we scroll into for more content) in plain JavaScript that we can run in the browser:

Method 1: use scrollIntoView and selectors

const delay = 3000;
const wait = (ms) => new Promise(res => setTimeout(res, ms));
const count = async () => document.querySelectorAll('.class_name').length;
const scrollDown = async () => {
  document.querySelector('.class_name:last-child')
    .scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
}

let preCount = 0;
let postCount = 0;
do {
  preCount = await count();
  await scrollDown();
  await wait(delay);
  postCount = await count();
} while (postCount > preCount);
await wait(delay);

In this method, we are comparing the # of .class_name selectors before scrolling (preCount) vs after scrolling (postCount) to check whether we are at bottom of page:

if (postCount > precount) {
  // NOT bottom of page
} else {
  // bottom of page
}

And here are 2 possible implementations using either setTimeout or setInterval with scrollBy in plain JavaScript that we can run in the browser console:

Method 2a: use setTimeout with scrollBy

const distance = 100;
const delay = 100;
while (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  document.scrollingElement.scrollBy(0, distance);
  await new Promise(resolve => { setTimeout(resolve, delay); });
}

Method 2b: use setInterval with scrollBy

const distance = 100;
const delay = 100;
const timer = setInterval(() => {
  document.scrollingElement.scrollBy(0, distance);
  if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
    clearInterval(timer);
  }
}, delay);

In this method, we are comparing document.scrollingElement.scrollTop + window.innerHeight with document.scrollingElement.scrollHeight to check whether we are at the bottom of the page:

if (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  // NOT bottom of page
} else {
  // bottom of page
}

If either of the JavaScript code above scrolls the page all the way down to the bottom, then we know it is working and we can automate this using Puppeteer.

Here are the sample Puppeteer Node.js scripts that will scroll down to the bottom of the page and wait a few seconds before closing the browser.

Puppeteer Method 1: use scrollIntoView with selector (.class_name)

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  const delay = 3000;
  let preCount = 0;
  let postCount = 0;
  do {
    preCount = await getCount(page);
    await scrollDown(page);
    await page.waitFor(delay);
    postCount = await getCount(page);
  } while (postCount > preCount);
  await page.waitFor(delay);

  await browser.close();
})();

async function getCount(page) {
  return await page.$$eval('.class_name', a => a.length);
}

async function scrollDown(page) {
  await page.$eval('.class_name:last-child', e => {
    e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}

Puppeteer Method 2a: use setTimeout with scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await scrollToBottom(page);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom(page) {
  const distance = 100; // should be less than or equal to window.innerHeight
  const delay = 100;
  while (await page.evaluate(() => document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight)) {
    await page.evaluate((y) => { document.scrollingElement.scrollBy(0, y); }, distance);
    await page.waitFor(delay);
  }
}

Puppeteer Method 2b: use setInterval with scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await page.evaluate(scrollToBottom);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom() {
  await new Promise(resolve => {
    const distance = 100; // should be less than or equal to window.innerHeight
    const delay = 100;
    const timer = setInterval(() => {
      document.scrollingElement.scrollBy(0, distance);
      if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
        clearInterval(timer);
        resolve();
      }
    }, delay);
  });
}
14

based on answer from this url

await page.evaluate(() => {
  window.scrollTo(0, window.document.body.scrollHeight);
});
1
  • 10
    window.innerHeight doesn't scroll all the way to the bottom, but with window.scrollTo(0,window.document.body.scrollHeight) it does.
    – K. Frank
    Commented Nov 16, 2020 at 3:43
11

Much easier:

    await page.evaluate(async () => {
      let scrollPosition = 0
      let documentHeight = document.body.scrollHeight

      while (documentHeight > scrollPosition) {
        window.scrollBy(0, documentHeight)
        await new Promise(resolve => {
          setTimeout(resolve, 1000)
        })
        scrollPosition = documentHeight
        documentHeight = document.body.scrollHeight
      }
    })
8

Many solutions here assume the page height being constant. This implementation works even if the page height changes (e.g. loading new content as user scrolls down).

await page.evaluate(() => new Promise((resolve) => {
  var scrollTop = -1;
  const interval = setInterval(() => {
    window.scrollBy(0, 100);
    if(document.documentElement.scrollTop !== scrollTop) {
      scrollTop = document.documentElement.scrollTop;
      return;
    }
    clearInterval(interval);
    resolve();
  }, 10);
}));
1
  • For pages with height changes, this function resolves quicker...
    – Raunaqss
    Commented Jun 11, 2021 at 13:52
8

Pretty simple solution

let lastHeight = await page.evaluate('document.body.scrollHeight');

    while (true) {
        await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
        await page.waitForTimeout(2000); // sleep a bit
        let newHeight = await page.evaluate('document.body.scrollHeight');
        if (newHeight === lastHeight) {
            break;
        }
        lastHeight = newHeight;
    }
7

A similar solution to @EdvinTr, it's giving me great results. Scrolling and comparing with the page's Y Offset, very simple.

let originalOffset = 0;
while (true) {
    await page.evaluate('window.scrollBy(0, document.body.scrollHeight)');
    await page.waitForTimeout(200);
    let newOffset = await page.evaluate('window.pageYOffset');
    if (originalOffset === newOffset) {
        break;
    }
    originalOffset = newOffset;
}
4

You might just use the following code using page.keyboard object:

await page.keyboard.press('ArrowDown');
delay(2000) //wait for 2 seconds
await page.keyboard.press('ArrowUp');
function delay(milliseconds) { //function for waiting
        return new Promise(resolve => {
          setTimeout(() => {
            resolve();
          }, milliseconds);
        });
      }
2
  • 1
    Only when we have these up and down buttons. Commented May 28, 2020 at 8:00
  • It doesn't work on mobile? Commented Dec 10, 2022 at 1:47
4

Rather than using setTimeout or setInterval, it's probably safer to wait for any network calls to finish. Scrolling might cause extra content to load, which you will want to wait for.

const scrollToBottom = async (page: Page) => {
    await new Promise<void>((resolve, reject) => {
        // keep track of distance scrolled
        let totalHeight = 0;
        // amount to scroll each time
        const scrollAmount = 300;
        const scrollDownAndCheck = async (
            promise: Promise<void>,
        ): Promise<void> => {
            return promise.then(async () => {
                // determine if we have reached the bottom or not
                const shouldReturn = await page.evaluate((totalHeight) => {
                    return (
                        totalHeight >=
                        document.body.scrollHeight - window.innerHeight
                    );
                }, totalHeight);
                // if we reached the bottom, don't add any more .then() calls
                if (shouldReturn) {
                    return promise;
                }
                // scroll down by a chunk
                await page.evaluate((scrollAmount) => {
                    window.scrollBy(0, scrollAmount);
                }, scrollAmount);
                // keep track of how much has been scrolled
                totalHeight += scrollAmount;
                // wait for any network loads that may have been triggered by the scroll
                await page.waitForNetworkIdle();
                // do this loop over again
                return scrollDownAndCheck(promise);
            });
        };
        scrollDownAndCheck(Promise.resolve())
            .then(() => {
                resolve();
            })
            .catch(reject);
    });
};

This will be slower because it's waiting for the network to be idle, but if the content loaded by scrolling takes a long time, you may not get good results with setInterval or setTimeout.

1
  • 1
    thanks works with puppeteer latest 2024 february Commented Feb 18 at 9:31
3

why not just

await page.keyboard.press("PageDown");
1
  • For me it didn't work, because I had a "sticky" content at the bottom. The sticky part stayed at the same position without moving all the way down, like it would normally when opening a page. What worked for me was the "scrollTo" solution. Just FYI
    – mariodev
    Commented Aug 18, 2023 at 19:39
3
await page.keyboard.down('End')

basically when executing it, the playwright will hold the End key on the keyboard, if you want you can use press and add in a loop that will have the same effect.

4
  • 1
    Please explain your code. Commented Nov 30, 2022 at 11:35
  • 1
    basically when executing it, the playwright will hold the End key on the keyboard, if you want you can use press and add in a loop that will have the same effect.
    – PAS
    Commented Nov 30, 2022 at 16:33
  • It doesn't work on mobile? Commented Dec 10, 2022 at 1:48
  • if it is the puppeteer or playwright simulation it should work, because he only uses the browser in mobile mode.
    – PAS
    Commented Dec 11, 2022 at 20:45
2

I've seen the accepted answers, and wanted to propose what I'd think is a much simpler solution. Puppeteer has support for simulating a mouse wheel.

It would work like this:

  • Find the bounding box for the element you would like to scroll.
  • Move the mouse to the the center of the element.
  • Scroll the mouse wheel down.

This solution would work even if you had multiple scroll-able components in the page.

I have an example snippet where setInterval from timers/promises (unavailable on node 14 and below, but not important to the example) is set to await for fresh data from an AJAX. it has type annotation in TypeScript, but works for JS as well:

import { setInterval } from "timers/promises";

import * as puppeteer from 'puppeteer';

/**
 * @param page 
 */
async function getScrollContent(page: puppeteer.Page): Promise<boolean> {
    logger.trace("Running scroll down function");
    const section = await page.$('.biab_body.contentWrap'); // find containing body of the content. In this case it's a <div class="biab_body contentWrap">
    if (section !== null) {
        logger.trace("Found section");

        /**
         * Using a set number of scrolls to fetch new content.
         * Chose this method for simplicity, but a more advanced method 
         * would check for no changes in the dimensions of the bounding 
         * box to determine that no new content is available.
         */
        const numScrolls = 10;
        let counter = 1;
        const delayBetweenScrollsMills = 2000; // give time for the page to make AJAX call for new content.

        for await (const value of setInterval(delayBetweenScrollsMills, numScrolls)) {
            if (counter > value) {
                break; // stop scrolling for new data
            } else {
                const boundingBox = await getBoundingBox(section);
                scrollDown(page, boundingBox);
                counter = counter + 1;
            }
        }
        return true;
    } else {
        logger.trace("Failed to find section.");
        return false;
    }
}

/**
 * Get the bounding box for the element to be scrolled.
 * @param elementHandle
 * @returns 
 */
async function getBoundingBox(elementHandle: puppeteer.ElementHandle): Promise<puppeteer.BoundingBox> {
    const boundingBox = await elementHandle.boundingBox();
    if (boundingBox !== null) {
        logger.trace(boundingBox);
        return boundingBox;
    } else {
        throw new Error("Failed to find bounding box for provided element");
    }
}

async function scrollDown(page: puppeteer.Page, boundingBox: puppeteer.BoundingBox): Promise<void> {
    // move mouse to the center of the element to be scrolled
    page.mouse.move(
        boundingBox.x + boundingBox.width / 2,
        boundingBox.y + boundingBox.height / 2
    );

    // use the mouse scroll wheel to to scroll. Change scroll down delta according to your needs.
    await page.mouse.wheel({deltaY: 300});
}

Edit: The bounding box might stay the same even with more content, so might bet better to check against the entire page's bounding box. For my use case, I could just add a large number of scroll iterations to when I was sure no more content is available.

1
  • Type 'Timeout' must have a '[Symbol.asyncIterator]()' method that returns an async iterator Commented Feb 18 at 9:08
1

I handle scrolling with CodeceptJS (the information herein is relevant for pure Puppeteer too) and the Puppeteer web driver via I.pressKey(). To support macOS, use [‘Command’,’DownArrow’] and for other operating systems, use ‘End’. Therefore, add two calls to I.pressKey(). As mentioned previously, this might not work in a mobile browser.

This will scroll the focused area to the bottom. Focusing the correct area first is paramount. One way is to click on an element in the desired area, such as a div.

To tell whether the area has actually scrolled, either:

  1. Look for a selector if you are able to compute a selector for new elements

  2. Diff the result of await I.grabPageScrollPosition() before and after the key presses.

  3. If the frontend team is able to help you by adding an element representing “the end,” that’s your most reliable option. However, if infinite truly means infinite, this will not be possible.

What about the network I/O that’s needed to retrieve new items? How do you know when to look at the page for new items? Unfortunately, unless your test knows how many items are available (eg by calling a REST API) and how many have been downloaded, it can only make a good happy path guess. Network failures and unexpected latency will always thwart optimistic guesses.

(A) Loop three or so times with a brief wait to guarantee there are no more items.

(B) You might be able to wait for a spinner to disappear.

0

Elaborating on Poker Player's answer, in my case, the website had a search input that was focused-in once the site loaded, so await page.keyboard.press("End") didn't work.

I managed to scroll down and get all the content using

 await page.mouse.click(1, 1);
 await page.keyboard.press("End");
 await timeout(2000);
export async function timeout(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

Not the answer you're looking for? Browse other questions tagged or ask your own question.