logo
down
shadow

Data scrapping with Nokokiri and Pismo


Data scrapping with Nokokiri and Pismo

By : user2950235
Date : November 17 2020, 11:52 AM
I hope this helps . See Phrogz answer here: Nokogiri, open-uri, and Unicode Characters which I think correctly describes what is happening for you. In summary, for some reason there is an issue passing the IO object created by open-url to nokogiri. Instead read the document in as a string and give that to Nokogiri, i.e.:
code :
require 'nokogiri'
require 'open-uri'

open("https://www.youtube.com/watch?v=QXAwnMxlE2Q") {|f|
  p f.content_type     # "text/html"
  p f.charset          # "UTF-8"
  p f.content_encoding # []
}

doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q"))
puts doc.title.to_s # =>  NTV interview foreigners in Japan æ¥ãã¬å¤äººè¡é ­ã¤ã³ã¿ãã¥ã¼ English Subtitles è±èªå­å¹ - YouTube


doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q").read)
puts doc.title.to_s # => NTV interview foreigners in Japan 日テレ外人街頭インタビュー English Subtitles 英語字幕 - YouTube
doc = Nokogiri::HTML(open("https://www.youtube.com/watch?v=QXAwnMxlE2Q"), nil, "UTF-8")


Share : facebook icon twitter icon
How to parse article page links in rails with pismo and mechanic?

How to parse article page links in rails with pismo and mechanic?


By : Nils
Date : March 29 2020, 07:55 AM
This might help you I'm trying to parse links on multiple-page articles to automatically click through them to extract the whole article content. I'm using mechanize, regarding to my last question and the helpful answer. , I Write One Function For This
code :
function proccessURL($ParentURL,$URL){
   $parse_url=parse_url($URL);
    if(@$parse_url['host']==""){
        $Parent_URL=parse_url($ParentURL);
        $path=explode("/",@$parse_url['path']);
        $redirect=0;    
        $lkey=0;
        $flag=false;
        while(list($key,$val)=each($path)){
            if($val==".." or $val=="." or $val=="..."){
                $redirect++;
                $lkey=$key;
                $flag=true;
            }else{
                break;
            }
        }
        if($flag){
           $matches=explode("/",$Parent_URL['path']);
           end($matches);
           $b=each($matches);
           $n=$b['key'];
           $url='';
           for($i=0;$i<$n-$redirect;$i++){
               $url.=$matches[$i]."/";
           }   
           for($i=$redirect+1;next($path);$i++){
               $url.=$path[$i]."/";
           }
           rtrim($url,"/");
           $parse_url['path']=$url;
        }else{
            $parse_url['path']="/".@$parse_url['path'];
        }
    }else{
        $Parent_URL['scheme']=$parse_url['scheme'];
        $Parent_URL['host']=$parse_url['host'];
    }
    //print_r($parse_url);
    if(@$parse_url['query']!=""){
        $parse_url['query']="?".@$parse_url['query'];
    }
    if(@$parse_url['fragment']!=""){
        $parse_url['fragment']="#".@$parse_url['fragment'];
    }
    return $Parent_URL['scheme']."://".@$Parent_URL['host'].@$parse_url['path'].@$parse_url['query'].@$parse_url['fragment'];
}
$CorrectLink=proccessURL("http://www.sepidarcms.ir/kernel/","../plugin/1.php");
$html="Your HTML Str";
$URL="Your HTML Page Link";
preg_match_all("/href=\"([^\"]*)\"/is", $html, $matches);
while(list($key,$val)=each($matches[1])){
$val=proccessURL($URL,$val);
    echo $val;
}
rails4, strong params pismo

rails4, strong params pismo


By : user2250138
Date : March 29 2020, 07:55 AM
To fix this issue I know how so save a page title and favicon using the pismo gem with Rails 3.2.18. My question is how can I do the same with Rails 4 strong params it's kinda confusing to me.
Web-scrapping for td data

Web-scrapping for td data


By : user2603463
Date : March 29 2020, 07:55 AM
this one helps. You need to wait for the table to load. Simply adding a delay made it work for me:
code :
driver.get(url)

time.sleep(3)

table = driver.find_element_by_id('quotesFuturesProductTable1')
...
DEC 2014 168.025
FEB 2015 166.900
APR 2015 164.775
JUN 2015 154.800
AUG 2015 152.900
OCT 2015 154.100
DEC 2015 154.250
FEB 2016 153.850
APR 2016 0.000
Scrapping commented Data (<!-- -->) i.e. data inside them using jsoup library

Scrapping commented Data (<!-- -->) i.e. data inside them using jsoup library


By : Vijay Kumar
Date : March 29 2020, 07:55 AM
I hope this helps . If you are sure that html in comments is valid you can simply remove them and then parse resulting html:
code :
String html = doc.html(); 
html = html.replaceAll("<!--", "").replaceAll("-->", ""); 
doc = Jsoup.parse(html);
scrapping a website for data

scrapping a website for data


By : Wavermeulen
Date : March 29 2020, 07:55 AM
Hope this helps What you are looking for is called an RSS feed and most news sites have them so you can easily parse new stories.
For CNN you can check here: http://www.cnn.com/services/rss/ and pick a RSS feed you would like.
Related Posts Related Posts :
  • How to implement custom mutating methods in Ruby?
  • Precedence operators in Ruby from Haskell?
  • How to install command line tools on OSX Mavericks
  • When making network requests, when should I use Threads vs Processes?
  • how to call method in one application from another application in ruby on rails
  • LoadError on line ["51"] when trying compass watch command
  • Why does `Dir[directory_path].empty?` return `false` all the time?
  • ruby conjunction and union operators
  • Ruby - watch for file with extension being updated
  • How do I have Ruby YAML dump a Hash subclass as a simple Hash?
  • How do I get ruby-prof to ignore Ruby core / standard library / gem methods?
  • How do I give my instance variable a getter?
  • syntax error, unexpected '=', expecting keyword_end
  • Symbol literal or a method
  • Singleton logger usage in ruby
  • Difficult code packaging design
  • Passing absent parameters
  • Is there a more efficient way of ensuring my database gets closed?
  • Take in escaped input in Ruby command line app
  • String with comma-separated values and newlines: split values and create arrays for each newline
  • Instance variables on Ruby main class
  • How do I cache user specific objects
  • How to specify the location of the chromedriver binary
  • How to setup "application/ld+json" schema.org meta data in rails 4 app
  • How to use String split[]
  • Capture Ruby Logger output for testing
  • Regex group match if present
  • Unusual use of module namespacing
  • Why doesn't this loop stop?
  • How do I run Rails/Rake from another directory?
  • Ruby Tempfile doesn't Create File on Disk
  • Consecutive letter frequency
  • Calling second-level function from the second level in Ruby
  • How do I use a Chef Resource in a Library
  • Ruby Method Name Interpolation
  • Weird behavior of #upcase! in Ruby
  • Ruby array access position in array
  • Ruby NameError: Undefined local variable
  • Using Nokogiri to validate XML: finding the line-nr of validation errors?
  • How to calling a function with arguments in one .rb script to another .rb script
  • Get and clear cookie using rest-client
  • invalid argument creating a ruby dev env with docker & fig
  • How can I better test equality for decorated objects?
  • Replace text in brackets gsub
  • Ruby's Array Combination Method
  • Don't have access to Heroku app
  • Take any hash keys and flatten into mixed array
  • Ruby On Rails Relationship Between Model, View, And Controller
  • Quitting method chain execution early
  • Date format ends before converting entire input string
  • Rails 4: display values for lookups
  • Search for uppercased substring
  • Check params presence in Grape
  • How to create a method that checks if string1 can be rearranged to equal string2?
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org