There are some occasions where you need to get the attribute value in PHP from a HTML element, this can be for migration reasons or perhaps you are writing a script to scrap a website to store the values found. I was writing a script that needed to scrape a web page and pick up all the URLs for the images on the page and store the src values. You are able to load the HTML contents of the page by using the file_get_contents() on a URL. Once you get the HTML for an image tag, you then need to take this HTML and pull out the value of the contents inside the src attribute. In this article we are going to use two methods on how you can do this in PHP. The first is going to use the DOMDocument object and then XPath to pull out the values of the src attribute. The second is by using a regular expression to get the contents inside the src="" attribute.
In PHP there is a library of DOMDocument, this object allows you to provide it with a whole HTML or XML document and then use the built in method to parse this in any way you want, you can use it to get certain tags, you can create new elements or read contents of a tag. In this example we are going to have the HTML image tag.
<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />
Now we are going to use the DOMDocument object with XPath to get the value of the src attribute.
$html = '<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />'; $doc = new DOMDocument(); $doc->loadHTML($html); $xpath = new DOMXPath($doc); $src = $xpath->evaluate("string(//img/@src)"); // will return /images/image.jpg echo $src;
The above example shows how we instantiate the DOMDocument object, then use the loadHTML() method and pass in the HTML for the image tag. We can then pass this document into a DOMXPath object so that we can not use the XPath of string(//img/@src) to get the value of the src attribute.
The other option you can use in this situation is to use a regular expression to search the string of the image tag for the src attribute and return the value inside the attribute tag. To get the src of the image tag we are going to use the regular expression of @src="([^"]+)"@.
$html = '<img border="0" src="/images/image.jpg" alt="Image" width="100" height="100" />'; preg_match( '@src="([^"]+)"@' , $html, $match ); $src = array_pop($match); // will return /images/image.jpg echo $src;