Data scrapping with Nokokiri and Pismo

By : user2950235
Date : November 17 2020, 11:52 AM
I hope this helps . See Phrogz answer here: Nokogiri, open-uri, and Unicode Characters which I think correctly describes what is happening for you. In summary, for some reason there is an issue passing the IO object created by open-url to nokogiri. Instead read the document in as a string and give that to Nokogiri, i.e.:
code :
require 'nokogiri'
require 'open-uri'

open("https://www.youtube.com/watch?v=QXAwnMxlE2Q") {|f|
  p f.content_type     # "text/html"
  p f.charset          # "UTF-8"
  p f.content_encoding # []

doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q"))
puts doc.title.to_s # =>  NTV interview foreigners in Japan æ¥ãã¬å¤äººè¡é ­ã¤ã³ã¿ãã¥ã¼ English Subtitles è±èªå­å¹ - YouTube

doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q").read)
puts doc.title.to_s # => NTV interview foreigners in Japan 日テレ外人街頭インタビュー English Subtitles 英語字幕 - YouTube
doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q"), nil, "UTF-8")

How to parse article page links in rails with pismo and mechanic?

By : Nils
Date : March 29 2020, 07:55 AM
This might help you I'm trying to parse links on multiple-page articles to automatically click through them to extract the whole article content. I'm using mechanize, regarding to my last question and the helpful answer. , I Write One Function For This
code :
function proccessURL($ParentURL,$URL){
            if($val==".." or $val=="." or $val=="..."){
    return $Parent_URL['scheme']."://".@$Parent_URL['host'].@$parse_url['path'].@$parse_url['query'].@$parse_url['fragment'];
$html="Your HTML Str";
$URL="Your HTML Page Link";
preg_match_all("/href=\"([^\"]*)\"/is", $html, $matches);
    echo $val;
rails4, strong params pismo

By : user2250138
Date : March 29 2020, 07:55 AM
To fix this issue I know how so save a page title and favicon using the pismo gem with Rails 3.2.18. My question is how can I do the same with Rails 4 strong params it's kinda confusing to me.
Web-scrapping for td data

By : user2603463
Date : March 29 2020, 07:55 AM
this one helps. You need to wait for the table to load. Simply adding a delay made it work for me:
code :


table = driver.find_element_by_id('quotesFuturesProductTable1')
DEC 2014 168.025
FEB 2015 166.900
APR 2015 164.775
JUN 2015 154.800
AUG 2015 152.900
OCT 2015 154.100
DEC 2015 154.250
FEB 2016 153.850
APR 2016 0.000
Scrapping commented Data (<!-- -->) i.e. data inside them using jsoup library

By : Vijay Kumar
Date : March 29 2020, 07:55 AM
I hope this helps . If you are sure that html in comments is valid you can simply remove them and then parse resulting html:
code :
String html = doc.html(); 
html = html.replaceAll("<!--", "").replaceAll("-->", ""); 
doc = Jsoup.parse(html);
scrapping a website for data

By : Wavermeulen
Date : March 29 2020, 07:55 AM
Hope this helps What you are looking for is called an RSS feed and most news sites have them so you can easily parse new stories.
For CNN you can check here: http://www.cnn.com/services/rss/ and pick a RSS feed you would like.
