💻
Cheatsheets
  • Most Useful Command Line Tools: 50 Cool Tools to Improve Your Workflow, Boost Productivity, and More
  • 7_tips_to_reverse_engineer_javascript
  • Configuring a Repl
  • How to create your command-line program (CLI) with NodeJS and Commander.js | by Duc N. | JavaScript
  • replit Node.JS 24/7 Project Hoster
  • cheatsheets
  • Alacritty, Tmux, and Vim
  • amethyst
  • Android
  • Installing Arch Linux
  • Arch Linux
  • aria2
  • bin
  • bspwm
  • Chocolately Notes
  • command_line_pipes
  • CSS Grid
  • curl
  • The curl guide to HTTP requests
  • Docker
  • Easymotion
  • Emmet
  • Favorite figlet fonts
  • FFMPEG
  • figlet
  • File Serve
  • File Transfer
  • fish shell
  • Front End Dev Links
  • How to use Git.io to shorten GitHub URLs and create vanity URLs
  • Git
  • Downloading a Tarball from GitHub
  • Make Infinite Gmail Addresses For One Inbox
  • How To Use GPG on the Command Line
  • guide_to_fish_completions
  • Homebrew
  • How to clean Arch Linux
  • HTML5 Boilerplate
  • Install
  • All the keyboard shortcuts you’ll ever need for Safari on iPad
  • iosevka
  • iPhone
  • ish (iOS)
  • Javascript Notes
  • jq
  • Jupyter Notebooks
  • Lettering
  • lf-wiki
  • lf
  • Command Line
  • Adding a swapfile after a clean installation without swap partition
  • mac_bluetooth_issues
  • Mac Terminal
  • maim
  • markdown-sample
  • Markdown Notes
  • Images in README.md Markdown Files
  • Organizing information with tables
  • md_cheatsheet
  • NiftyWindows Help
  • nix
  • Justin Restivo - A Portable Text Editor: Nix <3 Neovim
  • NPM
  • neovim configuration
  • Pastery
  • Powershell
  • Table of Basic PowerShell Commands | Scripting Blog
  • Powershell Modules
  • Puppeteer
  • Python
  • rclone-colab
  • replit
  • Hi there, I'm Raju Ghorai - a.k.a. [coderj001]
  • Scriptable
  • Servor
  • Replacing Postlight’s Mercury scraping service with your self-hosted copy
  • Shell Scripts
  • skhd
  • Spicetify
  • SSH
  • SurfingKeys
  • tar
  • Terminal Web Browser Docker
  • Text Generators
  • tmux shortcuts & cheatsheet
  • unicode
  • VIM
  • VIM Diff
  • vi Complete Key Binding List
  • 8 Essential Vim Editor Navigation Fundamentals
  • Vim Shortcut Keys
  • Vite
  • VNC
  • web-servers
  • Web Server
  • Windows Command Line
  • Writeguard
  • WSL Cheatsheet
  • youtube-dl
  • zsh Plugins
  • zspotify
Powered by GitBook
On this page

Was this helpful?

Puppeteer

PreviousPowershell ModulesNextPython

Last updated 2 years ago

Was this helpful?

Getting React / Vue pages

It's probably because the contents of rest of those elements are loaded dynamically with a Javascript framework like React or Vue. This means that it only gets loaded when those elements enter the viewport of the browser.

To fix this you will need to write a function that auto scrolls the page so that those elements can get into the viewport and then you have to wait for that function to finish before you collect the data.

The scrolling function:

const autoScroll = async(page) => {
    await page.evaluate(async () => {
        await new Promise((resolve, reject) => {
            var totalHeight = 0;
            var distance = 100;
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;

                if(totalHeight >= scrollHeight){
                    clearInterval(timer);
                    resolve();
                }
            }, 30);
        });
    });
}

Then call this function after page.goto() and before you grab the content with page.content(). I also set the viewport width and height then the scrolling goes a little faster:

await page.goto(url, {waitUntil: 'load'});
await page.setViewport({
    width: 1200,
    height: 800
});
await autoScroll(page); // The scroll function
const html = await page.content()
node.js - puppeteer await page.$$('.className'), but I get only the first 11 element with that class, why? - Stack Overflow