Improving WordPress Part 3 – SoC & The Editor (A suggestion to Core)

If you already know all about the WordPress editor, you can skip directly to the section on separation of concern.

The WordPress editor is a curious beast. It’s oddly tied down into the core codebase, despite being a completely external project (TinyMCE), and it has given rise to a significant number of hacks and workarounds, to try to support the various workflows of different WordPress users. Since 2017 is the year for WordPress core to focus on the editor, I thought I’d put down some thoughts, in the hopes that I might help inform some decisions.

First, let’s talk about what the editor does, and walk through some of the different workflows it supports.

The WordPress editor, at its heart, is just a way to get the body of a post, whether text or html, into the database field “post_content”. That field is designed to be the sole, canonical source for WordPress post/page content, whether accessed by excerpt, search, feed, api, or any other method.

The simplest way to insert content into post_content is via “Text Mode”, where the user enters plain text, to be published as-is. Of course, unless the content is wrapped in <pre> tags, that content would look terribly unformatted to viewers, if loaded directly, so on save and output WordPress runs several regex-based filters to produce readable content, such as wpautop.

Text mode is useful for bloggers and writers who want to focus on content, and get their work out quickly and cleanly. The fact that it renders as plain text also makes it a breeze to use other editors and either copy/paste when complete, or hook into the WordPress API, and publish content directly from your editor of choice, such as MarsEdit. However, it leaves the author with very little control over appearance and formatting.

This lack of control is addressed in a few ways. Firstly, for formatting, text mode supports inserting HTML directly into the body of the post. Secondly, dynamic content can be added to the post via “shortcodes”, which are a bbcode-like language that theme and plugin authors can expand on. Both HTML and shortcodes, when entered, are stored directly in post_content.

Since adding HTML in plain-text isn’t all that intuitive (and because formatting it runs afoul of wpautop), WordPress also provides a visual editor, which renders tags as HTML, and provides easy access to several inline elements and headings.

Now, in 2017, the WordPress core is working on revamping the visual editor to focus on “content blocks” (user defined formatted areas). This project is codenamed gutenberg. However, there is a deep-rooted issue with these plans, which is cross-compatibility. The sad fact is, none of these modes actually work smoothly together.

Pitfalls of the unified post_content system

To maintain the tagless flexibility of text mode, html is added to posts at run-time, based on whitespace that would otherwise be ignored. However, because of that, it is impossible to effectively and legibly enter raw HTML or even shortcodes in textmode, because basic formatting and indenting of HTML (or shortcodes with content) will cause all kinds of <p>s, <br>s, and  s to be inserted into the output. Yes, this can be turned off with filters, but it leaves the two modes incompatible, as you can’t switch from one to the other without changing filters.

Next, visual mode escapes raw HTML that is entered into it, and if you switch to text mode to add HTML, then back to Visual, the code is often “Normalized” by the browser, in ways that can absolutely destroy functionality. In addition, certain valid HTML5 constructs (such as links that wrap block-level elements) are not supported by the visual editor, and will be converted to block level elements that contain links, in a way that often breaks both semantics and accessibility. In addition, some HTML is filtered out by WordPress’ sanitization functions, arbitrarily removing JavaScript handlers and some advanced attributes. So, now Visual Mode is also incompatible with text mode, or raw HTML.

Also, any HTML or shortcodes added to post_content becomes visible to the WordPress search functionality. This can lead to confusion for users, as the shortcode [fusion-gallery class='cold'] might appear on pages that have nothing to do with cold fusion, or the tag <marquee type="odeon"> might appear on pages that have nothing to do with the Odeon Theater’s front visage. In the worst case scenario, this type of search ambiguity can be used for penetration testing, as you can search for strings like [ngg_images, to search for plugins with known security flaws.

Starting with WordPress 4.4, Images added to posts also run through filters to add responsive source sets (formerly RICG responsive images). These filters match images based on their class attribute, while images in visual mode are displayed by their src attribute, meaning that an image’s source can be modified in text mode and saved, without modifying its class attribute. As a result, the image can look great to the user a a specific resolution, but other viewers on other devices may be served an entirely different asset. Furthermore, as these image sizes are chosen from the theme’s defined thumbnail sizes, themes with poorly chosen thumbnail sizes can serve extremely undersized images to certain screen sizes, and none of this is presented to the user.

The new idea with content blocks (project gutenberg) is to add sections of HTML with special markup into the content, to allow users to insert columns or images or other custom blocks, as defined by themes or plugins. I believe this idea, will further compound the core issue that the WordPress editor is dealing with: Lack of separation of concern.

What is Separation of Concern, and why is it Necessary?

Separation of concern is a concept in programming (and several other fields) which says “let different pieces of a project serve specific tasks”. Normally, with web design, it means that your HTML should be clean and semantic and clearly describe your site’s structure, your JavaScript should be minimal and well documented and offer additional functionality, and your CSS should be logical and offer visual styling. However, when it comes to WordPress, people usually mean it as “let the theme handle presentation of data, and plugins handle exposition of functionality”. The problem is that the entire concept breaks down, when it comes to post_content.

post_content only makes sense as the sole source of data for posts, when it is only serving raw text. That made sense when WordPress was a new system just for bloggers, and still mostly made sense as individual features were added, each compromising SoC, but in 2017, we’ve gotten to the tipping point, where WordPress is no longer a blogging platform with bolt-ons, it’s a Website Platform with great blogging capabilities. As such, we need a change of approach.

Most people who still use post_content for pages other than blogs are relying heavily on shortcodes and html in text mode to meet modern formatting needs. This clutters search, and (as I mentioned in my last “Improving WordPress” post) damages auto excerpts and feeds. In addition, when themes/plugins break or web-standards change, they are left with potentially hundreds of pages of broken code. As far as I can tell, content blocks are still subject to this issue.

Those who have abandoned post_content for ACF or Beaver Builder or Widget Areas or flat HTML templates are heavily reliant on plugins that fight the core for every inch, and still fail to properly address problems like search, feed, and excerpts, as well as SEO plugins and other tools that trust post_content to be the primary source of data.

How should the issue be addressed, moving forward?

There are already filters and actions in many places in code, that can override the normal functioning of post_content. As a first step, I propose that those filters be reviewed and holes be filled to make the WordPress editor into a completely modular construct. All references to post_content would pass through filters before being used, making the editor fully replaceable, not only globally, but on a per post_type basis. That way, the traditional editor could be maintained for blog posts or api connections, but it could be replaced wholesale wherever something more is needed, without regard for maintaining old functionality. Plugin and Theme authors would be informed that direct access to the post_content field was deprecated, and that they should move to use the helper functions provided.

As a second step, I recommend abstracting all post_content functionality into a class, similar to the existing Walker class for nav menus, that would be an optional value when declaring custom post types. Custom classes would be implemented wholesale, to ensure that all edge cases like search, seo plugins, api/cli access, and feeds would be addressed by plugin authors (or would explicitly block access to these methods, with a provided explanation. e.g. not all forms of editor need to support saving via API or connecting to MarsEdit). This would give a strong path forward to legitimize the already flourishing community of content builder plugins, and to give them better options for data storage, as they would have complete control of page rendering.

As the third step, I recommend re-purposing the post_content field as the “plain-text representation” of a post. It would be maintained for fast searches, feeds, and auto-excerpts (depending on the editor class functionality in question), but would no longer be considered the canonical version of a post, in any way. All methods of the editor class that change the post’s content would be expected to update post_content.

As the final step, I recommend creating an extensible “content bucket” system, with pre-defined names like “asides”, “pull-quotes”, “features”, “headings”, “images”, etc that custom editor classes could use as a common storage system. That way, if you switch editors, a lot of your content could come along, and simply be sorted into new buckets. In addition, themes could start to support these buckets in different ways, taking some of the burden of the presentation layer off the Editor.

Benefits to the proposed system of abstraction

Existing plugin ecosystems are moved into a healthier direction, rather than being abandoned
Flexibility for various use cases is maintained, while avoiding having to be bound by making all features back-compatible
True Separation of Concern
The ability to update post html in a single location, as web technologies evolve
The ability to stop parsing HTML with regular expressions, which is a BIG no-no in programming
The flexibility to apply post content in new ways that haven’t even been considered yet, without being tethered to old design choices.
The ability to maintain the interface, exactly as-is for legacy users, while still improving their experience by solving the issues of mixing plain-text mode with html.

Possible objections to the proposed system of abstraction, and rebuttals

We don’t get the benefit of “free browser rendering” – Since posts already pass through several regular expressions on the way to the browser, each of which introduce potential errors, and given the existing system of shortcodes, I don’t believe the performance hit is as large as imagined. Several other CMS systems (such as drupal) already assemble pages on the fly. In addition, with complete control over the editor class, there is no reason this abstraction couldn’t also improve integration with full-page cache systems like varnish, or at least single post-body caches, where all static elements could be pre-rendered in a cache, with markers for where to insert dynamic elements.
post_content will likely get out of sync with the actual structured data – This issue can be fairly well mitigated in a class-based system, as editor classes will specifically declare the forms of editing they allow. For example, if you try to edit a post via API when the post_type only supports a visual editor, it will throw an error.
Changing editors will lose data – That’s actually the beauty of SOC… changing editors would only lose structure. Content could be rebuilt at any time from the plain-text version. In addition, this issue exists currently with changing themes or plugins. Even if content blocks in gutenberg are able to exist purely as flat html, a plugin or theme change would leave custom blocks as uneditable blobs that don’t render properly. As a further enhancement, as a common language for data storage is created, several common blocks could transition data smoothly between editor classes, without losing either structure or data.
There are already filters for this in place – There are some filters in place for parts of this, and they are often applied in a piece-meal fashion by various plugins. Enforcing a class based system that extends a common base class, and is required to provide specific functionality will enforce a wholistic view on these editors.
This doesn’t help WordPress.com or Core – Abstracting the editor actually could do a lot of good for the core, by allowing them to build gutenberg and future enhancements without having to tie their hands to the existing functionality. WordPress.com could offer the choice of the traditional editor or the Gutenberg editor for pages or posts, or even provide a more visual option for pages only, that doesn’t need to be rooted in the idea of “posts”. In addition, with WordPress’s recent acquisitions of some major plugins, this would open the door to fixing and legitimizing hugely popular plugins like Visual Composer or Beaver Builder or even Divi, so that their large communities are back in line with WordPress ideals, and eventually it could pave the way for acquisition of one or more of the larger players.

Final Thoughts and Disclaimers

I am the primary author of a page builder called Blockade, which uses many of the same concepts that the content blocks team are pursuing. Obviously there could be some conflict of interest involved in my re-steering of project objectives, but that isn’t really the case. I built Blockade to be less of a visual builder than the competition, but I have always known that it is a hack, and isn’t capable of being the polished solution I want. That is not to put it behind other builders. They are all hacks, and so is the Gutenberg plan. Separation of Concern is the only way forward that doesn’t tie developers’ hands in one way or another. If Gutenberg becomes part of core, I will simply turn Blockade into a set of enhancements for that system. If the editor abstraction came into play, I could actually cut loose and build the ideal page builder I’m dreaming of. Either way, there is a path forward for me… but I worry about the path forward for all the websites that have chosen other enhancements to the editor experience, who may not have a path forward with the current plan.

2 Comments

Christian Zumbrunnen says:

February 12, 2017 at 12:02 pm

Hi Gregory
very interesting and you’re absolutely right. One of the frustrating problems that new WordPress users experience ist the lack of “design” or “layout” options for posts or pages.
It’s just a title and a content area and no way to style or layout elements in any other way than just choosing the right heading or in the best case aligning an image left or right. But already with this alignment there is a lot of frustration, since there is often no “clearing” and of course a user without HTML experience doesn’t know what to do.

Content “Blocks” is a concept that probably helps a lot. ( That’s why pagebuilders are so popular, I guess.) But it has to be done right. And I think it should go further then just inside post_content. I totally agree.

Some themes (e.g. Twenty Seventeen, Edin, Sequential…) started to offer a way to include other pages on the frontpage. I like this concept, but I think it could be extended to be a way to include blocks of any other page inside any page (not only the frontpage).

I imagine something like this:

I have a ordinary post, or more likely, an individual post type like testimonials, events, clients, team members…

By default there is an archive page anyway. (But as for now an archive page can not become the static front page, because it’s not a “physical” page that can be selected in the Customizer/Reading Settings).

An “archive” page should be possible to be selected as “Front Page Section n Content” (as it’s called in Twenty Seventeen) into any other page.

The display would follow the template hierarchy or special rules for included pages.

It’s not a perfect solution but it would allow to create pages with blocks that contain elements from other “pages” and that can be used on different pages.

I’m not sure, if I was able to express myself. English is not my mother tongue.

But what do you think of this?

- gschoppe says:
  
  March 5, 2017 at 8:44 am
  
  I think that’s a reasonable feature, but I’m not sure whether it should be in core.
  The issue there is that there’s a lot of logic surrounding what content to pull into what area, and it differs in a case by case basis. For example: with events, the user normally wants to see the events ordered by start date, as opposed to publication date. There might be a logical use for a “loop block”, that does nothing but render a custom wordpress query, but it still seems like the role might be better filled with a widgets block, and a custom widget. I really don’t like the idea of muddying the template hierarchy further by reusing templates. I’d much rather see a more formalized use of template parts, if such a suggestion was implemented.
  
  In general, a few people have proposed the idea of making parts of a page’s layout as separate pages, or custom post types, but the idea has a few flaws. One is that the query necessary to grab multiple posts and piece them together is much slower than the current database queries. Another is that it splits the content into many separate areas, which is unintuitive for users. There are even SEO concerns related to including the same part in multiple pages’ content. My primary objection is just that a post is a single, cohesive concept, and as such, the data should be structured in a cohesive manner.

Greg Schoppe