💻
Cheatsheets
  • Most Useful Command Line Tools: 50 Cool Tools to Improve Your Workflow, Boost Productivity, and More
  • 7_tips_to_reverse_engineer_javascript
  • Configuring a Repl
  • How to create your command-line program (CLI) with NodeJS and Commander.js | by Duc N. | JavaScript
  • replit Node.JS 24/7 Project Hoster
  • cheatsheets
  • Alacritty, Tmux, and Vim
  • amethyst
  • Android
  • Installing Arch Linux
  • Arch Linux
  • aria2
  • bin
  • bspwm
  • Chocolately Notes
  • command_line_pipes
  • CSS Grid
  • curl
  • The curl guide to HTTP requests
  • Docker
  • Easymotion
  • Emmet
  • Favorite figlet fonts
  • FFMPEG
  • figlet
  • File Serve
  • File Transfer
  • fish shell
  • Front End Dev Links
  • How to use Git.io to shorten GitHub URLs and create vanity URLs
  • Git
  • Downloading a Tarball from GitHub
  • Make Infinite Gmail Addresses For One Inbox
  • How To Use GPG on the Command Line
  • guide_to_fish_completions
  • Homebrew
  • How to clean Arch Linux
  • HTML5 Boilerplate
  • Install
  • All the keyboard shortcuts you’ll ever need for Safari on iPad
  • iosevka
  • iPhone
  • ish (iOS)
  • Javascript Notes
  • jq
  • Jupyter Notebooks
  • Lettering
  • lf-wiki
  • lf
  • Command Line
  • Adding a swapfile after a clean installation without swap partition
  • mac_bluetooth_issues
  • Mac Terminal
  • maim
  • markdown-sample
  • Markdown Notes
  • Images in README.md Markdown Files
  • Organizing information with tables
  • md_cheatsheet
  • NiftyWindows Help
  • nix
  • Justin Restivo - A Portable Text Editor: Nix <3 Neovim
  • NPM
  • neovim configuration
  • Pastery
  • Powershell
  • Table of Basic PowerShell Commands | Scripting Blog
  • Powershell Modules
  • Puppeteer
  • Python
  • rclone-colab
  • replit
  • Hi there, I'm Raju Ghorai - a.k.a. [coderj001]
  • Scriptable
  • Servor
  • Replacing Postlight’s Mercury scraping service with your self-hosted copy
  • Shell Scripts
  • skhd
  • Spicetify
  • SSH
  • SurfingKeys
  • tar
  • Terminal Web Browser Docker
  • Text Generators
  • tmux shortcuts & cheatsheet
  • unicode
  • VIM
  • VIM Diff
  • vi Complete Key Binding List
  • 8 Essential Vim Editor Navigation Fundamentals
  • Vim Shortcut Keys
  • Vite
  • VNC
  • web-servers
  • Web Server
  • Windows Command Line
  • Writeguard
  • WSL Cheatsheet
  • youtube-dl
  • zsh Plugins
  • zspotify
Powered by GitBook
On this page

Was this helpful?

Replacing Postlight’s Mercury scraping service with your self-hosted copy

PreviousServorNextShell Scripts

Last updated 3 years ago

Was this helpful?

Replacing Postlight’s Mercury scraping service with your self-hosted copy

by is a tool for scraping web pages. It “transforms web pages into clean text. Publishers and programmers use it to make the web make sense, and readers use it to read any web article comfortably.”

This was a fast, free web-based service, accessible via an API which worked rather beautifully, without a fuss.

Then, in February 2019, Postlight announced they were discontinuing the hosted service and open-sourcing . A mixed blessing, as, to keep using the service, this now required users of the service to set up their own version of Mercury, on their own, or Amazon’s servers.

Here’s what I did to get Mercury running on my own server. Your mileage may vary.

  1. curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.34.0/install.sh | bash

  2. After running that script, a message claims that you only have to close and open the terminal to have nvm running. That did not work for me and I had to run this (in the terminal):

    export NVM_DIR="$HOME/.nvm"<br/> [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm<br/> [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion

  3. nvm install 8.10

  4. This did not result in the installed version of node sticking between sessions. I had to edit my bash profile and add a line to it.

    To edit my bash profile:

    nano ~/.bash_profile

    Adding the following line:

    source ~/.bashrc

  5. npm install @postlight/mercury-parser

  6. Create a javascript file, for example with myfile.js:

    const Mercury = require('@postlight/mercury-parser');<br/> const url = 'https://en.wikipedia.org/wiki/John_von_Neumann';<br/> Mercury.parse(url).then(result => { console.log(result); } );

  7. Execute the file from the command line with:

    node myfile.js

    But, this does not make the result available through the web.

  8. Create /myapp/app.js with this:

    const express = require('express')
;<br/> const app = express()
;<br/> const port = 8888

;<br/> app.get('/myapp/', function (req, res) {<br/> const Mercury = require('@postlight/mercury-parser');<br/> const url = req.query.url;<br/> Mercury.parse(url).then(result => {
 res.send(result);
 } );

})<br/> app.listen(port)

  9. Run the app from the command line:

    node app.js

  10. Visit the page in the browser:

    http://mysite.com:8888/myapp/?url=https://en.wikipedia.org/wiki/John_von_Neumann

  11. npm install forever -g

  12. Start forever:

    forever start app.js

Now, I can replace this:

$postlightUrl = "https://mercury.postlight.com/parser?url=".$url;

With this:

$postlightUrl = "https://mysite.com:8888/myapp/?url=".$url;

Somewhat broken

This worked, but, a few weeks later, it no longer did. The version of node, nvm, did not persist between sessions. This was resolved by adding what is now step 5, above.

I also threw together a quick PHP-based solution that essentially does the same as the Mercury parser: extracting basic details from a webpage.

was provided, but few of those posting in the discussion group were (easily?) able to resolve the issues surrounding setting up their own instance. For me, Mercury using technology I’m not very familiar with, this was a bit of a hit-and-miss process that, eventually, did pay off.

Stephen Bradley wrote up , but I like to stay clear from Amazon.

I’m using. When setting up a (sub-)domain, I can select the option ‘Passenger’, which is for running Ruby, NodeJS and Python apps.

SSH to the new domain and install NVM as described. 
Specifically:

Update nvm to the minimum required version, in the terminal (based on):

Install the mercury parser in the terminal (based on):

Install expressjs as detailed.

But, you want this app to run forever.:

Some guidance
a guide on how to get Mercury running on an Amazon server
DreamHost
here
this post
this
here
Install forever
Mercury
Postlight
the code