In this tutorial we're going to find out how we can get the keyword frequency from a string or even a webpage in PHP. This information is useful when you're trying to work out the SEO of a page so you can see what words are most frequent on the page. The more frequent a keyword is on the page the more you will rank for that keyword. But you have to be careful not to publish that keyword too many times as search engines could see that as a spam tactic. The keyword frequency lets you see what percentage of the words on the page are used by this keyword. To get the keyword frequency of a page we need to use a couple of inbuilt PHP functions.
- file_get_contents - Get all the contents of a URL
- strip_tags - Remove the HTML tags
- str_word_count - Count the number of words in a string
- array_count_values - Counts the amount of times a value appears in an array
- arsort - Sort the array highest to lowest
- number_format - Format the percentage into 2 decimals
- array_splice - Return a section of an array. We'll use this to get the top 20 frequent words
First we need to get all the text on a webpage by using the file_get_contents function and store this in a variable.
// Get all the html on a page
$html = file_get_contents( $url );
When we use the file_get_contents function on a webpage it will return the entire content of the page including the HTML, as we don't need the HTML to get the keywords we then use the strip_tags function to remove the HTML. Using this with the function str_word_count with the 2nd parameter of 1 we can get an array of all the words on the entire page. Later we will need to work out the percentage of a word on the page so we need to get the total words on the page to use later.
// Get an array of all the words
$allWordsArray = str_word_count( strip_tags($html), 1);
$totalAllWordsArray = count($allWordsArray);
Using the array_count_values function it will work through all the words we found on the page and get a count of each one, returning an array of the results in a key value pair of the word and the count. As we will only want the top 20 words we need to sort the wordCount array highest to lowest by using the arsort function. After the array has been sorted we can then grab the first 20 by using array_splice.
// Get the amount of times a word appears on the page
$wordCount = array_count_values($allWordsArray);
arsort($wordCount);
// Get the top 20 words
$wordCount = array_splice($wordCount, 0, 20);
Once we have the top 20 words we need to work out the percentage this word appears on the document by dividing the word count by the total number of words and multiple by 100.
// Loop through all the word count array and work out the percentage of a word appearing on the page
$percentageCount = [];
foreach($wordCount as $words => $val)
{
$percentageCount[$words] = number_format(($val / $totalAllWordsArray) * 100, 2);
}
return $percentageCount;
Below is the keyword frequency of the Paulund homepage. - the - 3.98%
- to - 3.44%
- a - 3.01%
- you - 2.15%
- of - 1.94%
- WordPress - 1.61%
- can - 1.40%
- and - 1.40%
- more - 1.29%
- on - 1.29%
- Read - 1.08%
- this - 1.08%
- jQuery - 0.97%
- files - 0.97%
- In - 0.97%
- I - 0.86%
- we - 0.86%
- for - 0.86%
- marketplace - 0.86%
- get - 0.86%
Here is the full code you need to get the keyword frequency of a webpage, or you can become a Paulund member to download the demo files.
function getKeywordFrequency( $url )
{
// Get all the html on a page
$html = file_get_contents( $url );
// Get an array of all the words
$allWordsArray = str_word_count( strip_tags($html), 1);
$totalAllWordsArray = count($allWordsArray);
// Get the amount of times a word appears on the page
$wordCount = array_count_values($allWordsArray);
arsort($wordCount);
// Get the top 20 words
$wordCount = array_splice($wordCount, 0, 20);
// Loop through all the word count array and work out the percentage of a word appearing on the page
$percentageCount = [];
foreach($wordCount as $words => $val)
{
$percentageCount[$words] = number_format(($val / $totalAllWordsArray) * 100, 2);
}
return $percentageCount;
}