Extract preview text from markdown
Articles on this website have previews- the short extracts of every article displayed on the home page.
I demonstrate how these are constructed in PHP.
Starting with an article
$markdownBody, I compile it into an HTML string,
$html using Parsedown.
Extract the text from inside
We don't want headings and blocks of code appearing in previews! The following line filters out only the paragraphs.
preg_match_all('/<p>(.*?)<\/p>/', $html, $matches);
$matches contains two arrays.
array( 0 => array( 0=> <p>I am first</p> 1=><p>I am second</p> 2=><p>I am third</p> ) 1=>array( 0=>I am first 1=>I am second 2=>I am third ) )
The second array is more helpful. Let's join it into a string.
$matches = $matches; $p_html = implode(" ", $matches);
The paragraphs could contain HTML elements, so let's strip those too.
$p_text = strip_tags($p_html);
Limit the length of the text
PHP's wordwrap function can be used to split a string into roughly equal segments without breaking any words.
$p_text = wordwrap($p_text, 100); $p_text = explode("\n", $p_text); $p_text = $str . '...';