Package 'postlightmercury'

Title: Parses Web Pages using Postlight Mercury
Description: This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://mercury.postlight.com/>.
Authors: Mikkel Freltoft Krogsholm
Maintainer: Mikkel Freltoft Krogsholm <[email protected]>
License: MIT + file LICENSE
Version: 1.2
Built: 2025-02-19 02:57:58 UTC
Source: https://github.com/cran/postlightmercury

Help Index


Turns NULL values in a list into NAs.

Description

Turns NULL values in a list into NAs.

Usage

null_to_na(mylist)

Arguments

mylist

is a list, where the NULL values are to be turned into NAs.


Removes html

Description

The function uses tools from the rvest and xml2 packages to clean up the HTML and turning it into proper text.

Usage

remove_html(strings, trim = TRUE)

Arguments

strings

the string(s) you want to clean

trim

should the string be trimmed or not

Value

a string

Examples

## Not run: 
# First get api key here: https://mercury.postlight.com/web-parser/

# Then run the code below replacing the X's wih your api key.
url <- "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed"
my_data <- web_parser(page_urls = url,
                      api_key = XXXXXXXXXXXXXXXXXXXXXXX)

# With html formatting:
my_data$content

# Now remove it:
my_data$content <- remove_html(my_data$content)

# Without html formatting:
my_data$content

## End(Not run)

Parses web pages

Description

With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free.

Usage

web_parser(page_urls, api_key)

Arguments

page_urls

One or more urls to be parsed

api_key

Key for the API

Value

a tibble

Source

https://mercury.postlight.com/web-parser/

Examples

## Not run: 
# First get api key here: https://mercury.postlight.com/web-parser/

# Then run the code below replacing the X's wih your api key:
web_parser(page_urls = "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed",
           api_key = XXXXXXXXXXXXXXXXXXXXXXX)

## End(Not run)