Title: | Parses Web Pages using Postlight Mercury |
---|---|
Description: | This is a wrapper for the Mercury Parser API. The Mercury Parser is a single API endpoint that takes a URL and gives you back the content reliably and easily. With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free. See the webpage here: <https://mercury.postlight.com/>. |
Authors: | Mikkel Freltoft Krogsholm |
Maintainer: | Mikkel Freltoft Krogsholm <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2 |
Built: | 2025-02-19 02:57:58 UTC |
Source: | https://github.com/cran/postlightmercury |
Turns NULL values in a list into NAs.
null_to_na(mylist)
null_to_na(mylist)
mylist |
is a list, where the NULL values are to be turned into NAs. |
The function uses tools from the rvest and xml2 packages to clean up the HTML and turning it into proper text.
remove_html(strings, trim = TRUE)
remove_html(strings, trim = TRUE)
strings |
the string(s) you want to clean |
trim |
should the string be trimmed or not |
a string
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key. url <- "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed" my_data <- web_parser(page_urls = url, api_key = XXXXXXXXXXXXXXXXXXXXXXX) # With html formatting: my_data$content # Now remove it: my_data$content <- remove_html(my_data$content) # Without html formatting: my_data$content ## End(Not run)
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key. url <- "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed" my_data <- web_parser(page_urls = url, api_key = XXXXXXXXXXXXXXXXXXXXXXX) # With html formatting: my_data$content # Now remove it: my_data$content <- remove_html(my_data$content) # Without html formatting: my_data$content ## End(Not run)
With just one API request, Mercury takes any web article and returns only the relevant content — headline, author, body text, relevant images and more — free from any clutter. It’s reliable, easy-to-use and free.
web_parser(page_urls, api_key)
web_parser(page_urls, api_key)
page_urls |
One or more urls to be parsed |
api_key |
Key for the API |
a tibble
https://mercury.postlight.com/web-parser/
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key: web_parser(page_urls = "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", api_key = XXXXXXXXXXXXXXXXXXXXXXX) ## End(Not run)
## Not run: # First get api key here: https://mercury.postlight.com/web-parser/ # Then run the code below replacing the X's wih your api key: web_parser(page_urls = "https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", api_key = XXXXXXXXXXXXXXXXXXXXXXX) ## End(Not run)