logo
down
shadow

Parse varying HTML using htmlagilitypack in C#


Parse varying HTML using htmlagilitypack in C#

By : user2953588
Date : November 21 2020, 01:01 AM
fixed the issue. Will look into that further From your sample, it seems that you should parse a record by “td” (label) and “td[@class]" (value) since skypeid does not contain the “nowarp” attribute.
Check this sample:
code :
public class Employee
{
    public string ID { set; get; }
    public string Name { set; get; }
    public string Address { set; get; }
    public string Telephone { set; get; }
    public string Email { set; get; }
    public string WorkingHours { set; get; }
    public string Fax { set; get; }
    public string SkypeID { set; get; }
}
class Program
{
    static void Main(string[] args)
    {
        #region "HTML"

        string t = @"<table>
<tr><td nowrap>Name</td><td class=""title""><b>Amy</b></td></tr><tr>
<tr><td nowrap>ID</td><td class=""title""><b>12345</b></td></tr><tr>
<tr><td nowrap>Address</td><td class=""title""><b>36 Main St, Baton Rouge, LA</b></td></tr><tr>
<tr><td nowrap>Telephone</td><td class=""title""><b>123-456-7890</b></td></tr><tr>
<tr><td nowrap>Email</td><td class=""title""><b>Amy@yahoo.com</b></td></tr><tr>
<tr><td>skypeid</td><td class=""title""><b>oilcompany</b></td></tr><tr>
</table>

<table>
<tr><td nowrap>Name</td><td class=""title""><b>Cathy</b></td></tr><tr>
<tr><td nowrap>ID</td><td class=""title""><b>99345</b></td></tr><tr>
<tr><td nowrap>Address</td><td class=""title""><b>36 Main St, Baton Rouge, LA</b></td></tr><tr>
<tr><td nowrap>Telephone</td><td class=""title""><b>123-456-7899</b></td></tr><tr>
<tr><td nowrap>Working Hours</td><td class=""title""><b>8 PM - 6 AM</b></td></tr><tr>
<tr><td nowrap>fax</td><td class=""title""><b>123-456-1111</b></td></tr><tr>
</table> ";

        #endregion

        var doc = new HtmlDocument();
        doc.LoadHtml(t);


        var records = doc.DocumentNode.SelectNodes("//table");
        List<Employee> employees = new List<Employee>();
        foreach (var item in records)
        {
            var elem = item.Descendants().Where(m => m.Name == "td");
            var employee = new Employee();

            string elementName = "";
            foreach (var row in elem)
            {
                if (elementName == "")
                {
                    elementName = row.InnerText;
                }

                if (row.Attributes.Contains("class"))
                {
                    switch (elementName.Trim().ToLower())
                    {
                        case "name": employee.Name = row.InnerText.Trim();
                            break;
                        case "id": employee.ID = row.InnerText.Trim(); 
                            break;
                        case "address": employee.Address = row.InnerText.Trim();
                            break;
                        case "telephone": employee.Telephone = row.InnerText.Trim();
                            break;
                        case "email": employee.Email = row.InnerText.Trim();
                            break;
                        case "skypeid": employee.SkypeID = row.InnerText.Trim();
                            break;
                        case "working hours": employee.WorkingHours = row.InnerText.Trim();
                            break;
                        case "fax": employee.Fax = row.InnerText.Trim();
                            break; 
                    }

                    elementName = "";
                }
            }
            employees.Add(employee); 
        }

        foreach (var e in employees)
        {
            Console.WriteLine(e.Name);
        }

        Console.WriteLine("Press any key...");
        Console.ReadLine();
    } 
}


Share : facebook icon twitter icon
Use HtmlAgilityPack to parse HTML variable, not HTML document?

Use HtmlAgilityPack to parse HTML variable, not HTML document?


By : Cynthia_Cinderpelt
Date : March 29 2020, 07:55 AM
wish of those help I have a variable in my program that contains HTML data as a string. The variable, htmlText, contains something like the following: , You can use the LoadHtml method of HtmlDocument class
Parse a malformed HTML with HtmlAgilityPack

Parse a malformed HTML with HtmlAgilityPack


By : Nicola Gritti
Date : March 29 2020, 07:55 AM
wish of those help I am trying to parse a HTML page but the source is malformed:
code :
var web = new HtmlAgilityPack.HtmlWeb();
var doc = web.Load("http://anossaoficina.com/index.php?option=com_content&view=category&layout=blog&id=78&Itemid=474");

var DescritionShort = doc.DocumentNode
                      .SelectSingleNode("//div[@class='item column-1']//p[2]")
                      .NextSibling.InnerText;
html parse with HtmlAgilityPack in C#

html parse with HtmlAgilityPack in C#


By : Lamb Mei
Date : March 29 2020, 07:55 AM
Hope that helps , You could for example parse the rows like this:
code :
using System.Net;
using HtmlAgilityPack;

namespace ConsoleApplication5
{
    class Program
    {
        static void Main(string[] args)
        {
            WebClient webClient = new WebClient();
            string page = webClient.DownloadString("http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(page);

            HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
            foreach (var cell in table.SelectNodes("tr/td"))
            {
                string someVariable = cell.InnerText;
            }
        }
    }
}
    private static void Main(string[] args)
    {
        WebClient webClient = new WebClient();
        string page = webClient.DownloadString("http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(page);

        HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
        var rows = table.SelectNodes("tr/td").Select(cell => cell.InnerText).Where(someVariable => !String.IsNullOrWhiteSpace(someVariable)).ToList();
    }
How to parse tables without id on HTML using HtmlAgilityPack

How to parse tables without id on HTML using HtmlAgilityPack


By : Rob Morton
Date : March 29 2020, 07:55 AM
will help you Sorry I don't have VB installed but C# version should be enough to give you an idea. You have td_right class, you can use either lambda or xpath to query it. I like lambda/linq version more because I am familiar with linq, and I don't need to remember XPATH syntax.
Lambda:
code :
    public static bool HasClass(this HtmlNode node, params string[] classValueArray)
    {
        var classValue = node.GetAttributeValue("class", "");
        var classValues = classValue.Split(' ');
        return classValueArray.All(c => classValues.Contains(c));
    }

var url = "http://www.dietas.net/tablas-y-calculadoras/tabla-de-composicion-nutricional-de-los-alimentos/carnes-y-derivados/aves/pechuga-de-pollo.html#";
        var htmlWeb = new HtmlWeb();
        var htmlDoc = htmlWeb.Load(url);
        var nodes = htmlDoc.DocumentNode.Descendants("td").Where(_ => _.HasClass("td_right")).Select(_ => _.InnerText);
var nodes2 = htmlDoc.DocumentNode.SelectNodes("//td[@class='td_right']");
Using HTMLAgilityPack to parse an HTML string not from a URL

Using HTMLAgilityPack to parse an HTML string not from a URL


By : IlinTerz
Date : March 29 2020, 07:55 AM
With these it helps To parse a string containing an HTML snippet rather than a file or URL, you can use the HtmlDocument as @Oded suggested, but instead of using doc.Load(), use doc.LoadHtml().
Related Posts Related Posts :
  • Stop Continuation Task upon exception thrown from parent task
  • Is there a way to make msbuild write error output to stderr?
  • Unit testing: TDD with POCO Objects with navigation properties (relationship fixup)
  • GridSplitter disables my RowDefinition style
  • Memory leak only in Release mode
  • how to read specified string from url
  • how to store html code EMail template in a string variable or textbox c#
  • Set Selected Date to TimePicker wpf (TimePickers inside DataTemplate)
  • How to draw line in Silverlight Specifying Height?
  • Stored procedure executes but does not update data
  • ASP.NET ListBox Trouble
  • Why does C# also not allow empty conditions in while loops?
  • Unable to get a block of code into my regex match groups
  • What is difference between dbcontext.Add and dbcontext.AddObject
  • How do I update the file version number of C# DLL without recompiling?
  • Value for html control always null
  • HttpClient hangs when timeout is setting (Windows Phone)
  • Handle Multiple Form tag in asp.net page?
  • create word document with html content in c#
  • Confusion about Find And Replace
  • Format sms messages in Clickatell
  • Automated Function Overload
  • String to date in MS Access SQL statement gives type mismatch error
  • ShowDialog exiting on certain events
  • Is there a try Convert.ToInt32... avoiding exceptions
  • How to move wpf application into minimize tray at Window Start-up C#?
  • How to change display format of long variable?
  • How to use Addfields in MongoDB C# Aggregation Pipeline
  • MsTest TestCleanup method not called when an unhandled exception is thrown
  • missing last data when exporting gridview to excel
  • How to add array of objects to List in c#
  • Lambda Expression to order (sort) my list collection
  • Library for displaying music notation
  • How to compare two dictionaries in c# and get the output as True and False after validation
  • specify fields to be serialized with JSON
  • How do you obtain the content of a specific node using XmlDocument in C#?
  • How to ignore the first line in a csv file when you read the csv file in C#
  • c# - Problem calling public void from class
  • How to tell a class which objects it should create? Type vs. object confusion :(
  • Save CheckBox state to xml
  • WIX CAQuietExec NETSH Command Fails
  • Issues sending http put request every 60 seconds to RoR app
  • ConfigurationManager.ConnectionStrings.ConnectionString Issue
  • Real size WPF controls for printing
  • How to cancel properly?
  • C# String multiplication error
  • Using Solrnet and Assigning Attributes with Entity Framework Generated POCOs
  • Regex pattern for single backslash
  • TextBox: insert spaces for credit card number?
  • C# DLL loaded for exe-application is not found when launching similar DLL by rundll32.exe
  • Debug a Windows Service with WCF library
  • Open a file from an external assembly?
  • Servicestack RegistrationFeature Unable to bind request
  • Index was outside the bounds of the array confusion
  • Error in Xml to List code. The ':' character, hexadecimal value 0x3A, cannot be included in a name
  • I am trying to do a while loop with a string conditional statement in C#
  • C# 'Cannot access a disposed object. Object name: 'SslStream'.'
  • How to make Gecko use seperate CookieContainer per instance?
  • C# Advanced form "please wait"
  • Send and Receive data C# using network stream
  • shadow
    Privacy Policy - Terms - Contact Us © ourworld-yourmove.org