How can I scrape data that are dynamically generated by JavaScript in html document using C#?
Using WebRequest
and HttpWebResponse
in the C# library, I'm able to get the whole html source code as a string, but the difficulty is that the data I want isn't contained in the source code; the data are generated dynamically by JavaScript.
On the other hand, if the data I want are already in the source code, then I'm able to get them easily using Regular Expressions.
I have downloaded HtmlAgilityPack
, but I don't know if it would take care of the case where items are generated dynamically by JavaScript...
Thank you very much!
Pandepic :
When you make the WebRequest you're asking the server to give you the page file, this file's content hasn't yet been parsed/executed by a web browser and so the javascript on it hasn't yet done anything.\n\nYou need to use a tool to execute the JavaScript on the page if you want to see what the page looks like after being parsed by a browser. One option you have is using the built in .net web browser control: http://msdn.microsoft.com/en-au/library/aa752040(v=vs.85).aspx\n\nThe web browser control can navigate to and load the page and then you can query it's DOM which will have been altered by the JavaScript on the page.\n\nEDIT (example):\n\nUri uri = new Uri(\"http://www.somewebsite.com/somepage.htm\");\n\nwebBrowserControl.AllowNavigation = true;\n// optional but I use this because it stops javascript errors breaking your scraper\nwebBrowserControl.ScriptErrorsSuppressed = true;\n// you want to start scraping after the document is finished loading so do it in the function you pass to this handler\nwebBrowserControl.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserControl_DocumentCompleted);\nwebBrowserControl.Navigate(uri);\n\n\n\n\nprivate void webBrowserControl_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)\n{\n HtmlElementCollection divs = webBrowserControl.Document.GetElementsByTagName(\"div\");\n\n foreach (HtmlElement div in divs)\n {\n //do something\n }\n}\n",
2014-06-10T04:26:38