Scraping data dynamically generated by JavaScript in html document using C#

Ask Time：2014-06-10T07:31:56 Author：user3213711

How can I scrape data that are dynamically generated by JavaScript in html document using C#?

Using WebRequest and HttpWebResponse in the C# library, I'm able to get the whole html source code as a string, but the difficulty is that the data I want isn't contained in the source code; the data are generated dynamically by JavaScript.

On the other hand, if the data I want are already in the source code, then I'm able to get them easily using Regular Expressions.

I have downloaded HtmlAgilityPack, but I don't know if it would take care of the case where items are generated dynamically by JavaScript...

Thank you very much!

Author:user3213711，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/24130650/scraping-data-dynamically-generated-by-javascript-in-html-document-using-c-sharp

Pandepic :

When you make the WebRequest you're asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the javascript on it hasn't yet done anything.\n\nYou need to use a tool to execute the JavaScript on the page if you want to see what the page looks like after being parsed by a browser. One option you have is using the built in .net web browser control: http://msdn.microsoft.com/en-au/library/aa752040(v=vs.85).aspx\n\nThe web browser control can navigate to and load the page and then you can query it's DOM which will have been altered by the JavaScript on the page.\n\nEDIT (example):\n\nUri uri = new Uri(\"http://www.somewebsite.com/somepage.htm\");\n\nwebBrowserControl.AllowNavigation = true;\n// optional but I use this because it stops javascript errors breaking your scraper\nwebBrowserControl.ScriptErrorsSuppressed = true;\n// you want to start scraping after the document is finished loading so do it in the function you pass to this handler\nwebBrowserControl.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserControl_DocumentCompleted);\nwebBrowserControl.Navigate(uri);\n\n\n\n\nprivate void webBrowserControl_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)\n{\n HtmlElementCollection divs = webBrowserControl.Document.GetElementsByTagName(\"div\");\n\n foreach (HtmlElement div in divs)\n {\n //do something\n }\n}\n",

2014-06-10T04:26:38

Scraping data dynamically generated by JavaScript in html document using C#