php - Trying to use HTML DOM parser to get main image on Amazon page -
I'm trying to use HTML DOM Parser to get an image source of the "main" product image, regardless of product The page parser is pointing to.
On each page it seems that the ID is "Landing Image" on that image. You might think this should do the trick:
$ end-hour [$ i] [2] = $ html- & gt; Search ('IMG [ID = "Landing Image"]', 0) - & gt; Src;
But there is no such luck.
I also tried
foreach ($ html-> find ('img') as $ E) if (strops ($ E, 'landing) Image ')! == incorrect) {$ last end [$ i] [2] = $ e- & gt; Src; }
I have noticed that the image source is usually SE 300 or SX 300, so I did this:
foreach ($ html-> search ('Img') as $ E) if (strokes ($ E, 'Sx 300')! == incorrect) {$ endarray [$ i] [2] = $ e- & gt; Src; } And if (StraPau ($ E, 'SY300')! == Incorrect) {$ end time [$ i] [2] = $ e- & gt; Src; }
Unfortunately, some image source links do not have this, example:
http: // Www .amazon.com / gp / product / B001O21H00 / ref = as_li_ss_tl? I.e. = UTF8 and camp = 1789 and creative = 390957 and creative SISN = B001O 21H 00 and link code = AS2 and tag = BMRF-20
< Div class = "post-text" itemprop = "text">
Using the Amazon API can be a better solution, but this is not a question.
As I've downloaded html (javascript-driven content) from the sample web page, someone with me id = "landingImage"
[1] Could not find the tag too. But I can get an image tag with id = "main-image"
. Trying to remove this tag with the DOMDocument was not successful. Either way loaded HTML ()
and loaded HTMLFile ()
was not able to parse the HTML.
But the interesting part can be extracted regularly, expression. The following code will give you the image source:
$ url = 'http://www.amazon.com/gp/product/B001O21H00/ref=as_li_ss_tl?ie=UTF8&camp= In 1789 & Amp; Creative = 390957 & amp; CreativeASIN = B001O21H00 & amp; LinkCode = AS2 & amp; Tag = bmref-20 '; $ Html = file_get_contents ($ url); $ Matches = array (); If (preg_match ('# & lt; img [^ & gt;] * id = "main-image" [^> gt; * src = "(*.)" [^ & Gt;] *> gt; # ', $ Html, $ matches)) {$ src = $ matches [1]; } // is the source of the image // $ src: 'http://ecx.images-amazon.com/images/I/21JzKZ9%2BYGL.jpg'
< Sup> [1] Downloaded file_get_contents
in php with the html source function. Download html source with firefox results in a different HTML code. In the previous case, you will find an image tag with the ID attribute "LandingMez" (Javascript is not enabled!). It looks like the downloaded HTML source depends on the customer (the header in the HTTP request).
Comments
Post a Comment