| Title: | Parses Web Pages using Postlight Mercury |
|---|---|
| Description: | This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://mercury.postlight.com/>. |
| Authors: | Mikkel Freltoft Krogsholm |
| Maintainer: | Mikkel Freltoft Krogsholm <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2 |
| Built: | 2026-06-07 09:49:51 UTC |
| Source: | https://github.com/cran/postlightmercury |
Turns NULL values in a list into NAs.
null_to_na(mylist)null_to_na(mylist)
mylist |
is a list, where the NULL values are to be turned into NAs. |
The function uses tools from the rvest and xml2 packages to clean up the HTML and turning it into proper text.
remove_html(strings, trim = TRUE)remove_html(strings, trim = TRUE)
strings |
the string(s) you want to clean |
trim |
should the string be trimmed or not |
a string
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key. url <- "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed" my_data <- web_parser(page_urls = url, api_key = XXXXXXXXXXXXXXXXXXXXXXX) # With html formatting: my_data$content # Now remove it: my_data$content <- remove_html(my_data$content) # Without html formatting: my_data$content ## End(Not run)## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key. url <- "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed" my_data <- web_parser(page_urls = url, api_key = XXXXXXXXXXXXXXXXXXXXXXX) # With html formatting: my_data$content # Now remove it: my_data$content <- remove_html(my_data$content) # Without html formatting: my_data$content ## End(Not run)
With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free.
web_parser(page_urls, api_key)web_parser(page_urls, api_key)
page_urls |
One or more urls to be parsed |
api_key |
Key for the API |
a tibble
https://mercury.postlight.com/web-parser/
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key: web_parser(page_urls = "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", api_key = XXXXXXXXXXXXXXXXXXXXXXX) ## End(Not run)## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key: web_parser(page_urls = "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", api_key = XXXXXXXXXXXXXXXXXXXXXXX) ## End(Not run)