Skip to Main Content

What’s so bad about HTML Comments as structure?

I’ve been getting a lot of traffic recently, due to my detailed critiques of some of the choices being made by the developers of WordPress’s new Gutenberg editor. One point I keep mentioning is the problem with storing post structure as HTML comments. It’s been brought to my attention that I often gloss over this issue with a general dismissal, without detailing why I am so dead set against it. To me, a lot of these issues seem obvious, but to others they might not. I’ve got a unique blend of formal Computer Science training and in-the-trenches work on both Enterprise and OSS projects, that may lend a different viewpoint than most.

To that end, I wanted to put out a hyper-focused post, to explain all of the issues I see with the new WP Post Grammar structure.

Track Record

This one is a relatively basic issue. WP Post Grammar is new, and while it is a formal grammar, it hasn’t undergone real-world testing beyond what is supported by the currently limited API. As soon as Gutenberg is released, people will start abusing it in ways the core team hasn’t expected. With bespoke systems like wpautop and shortcodes, this kind of abuse lead to exposing serious flaws in the design, so much so that core has been vocal about their disdain of shortcodes as a syntax for years.

ADVERTISEMENT:

At this stage in WordPress’s life, it is important to not make another such misstep. Choosing a trusted, well-known and battle tested storage method like JSON would ensure that future use cases fit the structure.

 

Compatibility & Universality

WordPress posts are written to by many clients, tools, and plugins that are NOT the Gutenberg editor. By using a post grammar that is based in plain HTML, this encourages third party tools to write raw, unstructured data into a structure with a formal grammar.

Also, since WP Post Grammar only exists in WordPress’s PHP and JS, makers of third party tools in other languages, will have to completely re-implement WP Post Grammar to write structured data (even if there are API endpoints for the parsed structure in the future, it won’t fit all use cases). Meanwhile, JSON is supported out of the box in pretty much every modern programming language, and there are popular classes to implement it in non-modern languages.

Furthermore, with custom blocks fully defined in JS, and storing to a mix of HTML and Comments, for a third-party editor to present the user with blocks, they would have to blindly re-implement blocks, not knowing what custom blocks are supported by the platform.

Structure vs Presentation

Ideologically, blocks are structured data, not HTML. That gives blocks an amazing amount of flexibility, because presentation is not opinionated. For example, in the current system, an image is an img tag, wrapped in an a tag, with a p tag for caption, all wrapped in a div, with a custom class. in 4.4, source-set attributes were added to the img tags. Since they were already stored in the post_content of a million posts as flat HTML, a regex had to be used for this, which has known issues.

Blocks stand to fix this issue, by storing data in a structure that is wholly separate from HTML, meaning that when it comes time to output an image tag, WordPress, plugins, and the theme can decide together what HTML to output. Is it an img in a div? a picture in a figure? an inline svg? has the image file been modified or deleted since the post was created? From a structured format, you can make that decision on the fly.

ADVERTISEMENT:

Furthermore, with a structured format, you can make that decision based on context. The same post that uses modern markup to display a set of columns in browsers, can output table-based markup for email clients, or plain text for feeds. For site search purposes, you could even just render a list of words, with no html or format at all.

The WP Post Grammar, on the other hand, mixes structure and presentation. Let’s look at an image, stored in the Gutenberg data structure:

<!– wp:core/image {“id”:21} –>
<figure class=”wp-block-image”><img src=”http://gutenberg.dev/wp-content/uploads/2017/09/test_image.png” alt=”” />
<figcaption>test caption</figcaption>
</figure>
<!– /wp:core/image –>

As you can see, the data-structure itself presents a very opinionated choice of presentation. This was presented as a positive, to allow the post to take advantage of “free rendering” for many blocks… however, it means that changing presentation will either require converting every post to use a new block type on theme change, or it means that posts will have to be parsed from HTML to a structure, just to parse back to HTML on every page load. This double parsing goes against the idea of post_content as a single source of truth, as not all data from the original HTML format can be preserved across this transformation, and it will require a flat page cache to maintain any sort of performance at scale.

This particular example is actually a step backwards from the functionality of the current editor, which employs a filterable caption shortcode, so that themes and plugins can easily override the default opinionated content that WordPress creates.

One of the clear places to see this issue is with columns. An issue was opened about columns back on March 9th that shows the issue perfectly. The conversation quickly devolved into which of the many competing column standards WordPress should officially support. The obvious answer: Neither! Presentation is the theme’s job, and in different contexts, different presentations will be necessary. But Gutenberg’s format makes a single choice necessary.

Of course, columns are a problem that the core team argues will be addressed after Gutenberg’s release in WordPress 5.0, meaning that the format will already be in place on millions of sites, before one of the biggest problems with it becomes clear to everyone.

Common Point of Entry

The biggest reason for the Gutenberg team to build their comment-based structure is ostensibly to maintain the ability for posts to be edited as plain text, whether by the WordPress editor’s text mode or by third-party editors.

ADVERTISEMENT:

The issue this creates is that while Gutenberg may create a formal post grammar, the text editor (and any third-party editor) is not bound to this grammar. So, posts may lose data in the transition from HTML+Comments to Gutenberg and back, if the post is edited in text mode.

Instead, text mode and third-party editors should be forced to pass data through a validation function before saving. That way we can ensure that the post structure hasn’t been corrupted in the editing process. Of course, if any sort of validation was inserted, it would require a change to third-party editors to support Gutenberg, at which point there is no more reason to save the post in a plain-text compatible format, as it would be just as simple to convert a JSON-stored post to a plain-text compatible format for the editor to work with, then validate it and convert it back on save.

Instead, issues with the structured data that are introduced by plain-text editors may go completely unnoticed, until the user switches back to the block editor, at which point data may be inexplicably lost.

Fail Open Security

Many membership or restricted content plugins already fall victim to this flaw. They use a shortcode to wrap data that should not be visible to all users. As such, if any plugin conflict or error occurs, the default output of the page contains the private data. A lot of WordPress sites have leaked private data because of this issue. The reason for the issue is that WordPress considers the post_content the canonical output of the page, and only runs it through a filter to process identified shortcodes.

This is clearly an issue with plugin design, to some extent, but at the same time, it’s hard to argue with the flexibility gained by being able to insert any number of secure content areas in any position, that each have the full capabilities of the entire editor. There is no reason to expect that secure content plugins will not continue to be designed this way with blocks.

If the content was, instead, output totally based on context, only printing blocks that are recognized and properly parsed, any failed restricted content block would not be output, converting post_content to a much more secure “fail closed” system.

Back Compatibility

The Gutenberg team has been extremely vocal that preserving back-compatibility is the primary reason that the WP Post Grammar uses HTML Comments. So I figured moving content between the traditional post editor (tinyMCE) and Gutenberg must work pretty well. I decided to test that compatibility.

ADVERTISEMENT:

I created two identical posts. One was made with Gutenberg 1.1, and the other was made in TinyMCE. As you can see, they look pretty different in text mode (I copied them to a text editor, so I could test them without risking corruption):

Now that each version has a backup, lets see how compatible they are. Before we even get to our tests, take a quick look at the two plaintext renderings of our posts. Can you imagine hand-writing the post on the left? It’s pretty clear that plain-text tools are not gonna work well editing Gutenberg posts, so I’m gonna focus on visual tools that emulate the interface of TinyMCE.

Test 1 – Open a Gutenberg Post in TinyMCE and Save, then Re-Open in Gutenberg

This test is to verify the most basic compatibility with existing third-party editors. As TinyMCE is the editor all others have needed to be feature-compatible with for a decade, it is an excellent comparison to other third-party tools. Remember, in this test, literally no changes were made to the post. It was only opened and saved.

This is not what I expected to see in a back-compatible format.

UPDATE: Gutenberg 1.2.0 addresses this issue by modifying the traditional TinyMCE editor and the_content to disable wpautop when Gutenberg blocks are detected in post_content. This is not a viable solution to preserve back compatibility or third-party editor support, as older editor versions and third party editors do not have arbitrary checks to determine if wpautop should be enabled or disabled, so this change alone serves to invalidate those arguments against using a sane JSON format.

Test 2 – Open Gutenberg post in TinyMCE, modify a paragraph, and save, then Re-open in Gutenberg

In this test, I opened the Gutenberg post in TinyMCE, deleted the paragraph break in the first section, re-added it, and hit save. For the purposes of showing the result in Gutenberg, I clicked through to convert the resulting blocks to “classic text”.

ADVERTISEMENT:

The relevant issue to notice is that the text wasn’t just merged into one block, as you might imagine would happen. Because TinyMCE doesn’t recognize the block format, it made a best guess about what to do with the comment opening a Gutenberg block, when I backspaced over it, and rather than deleting it, it was moved further down in the content, creating a second, empty block. In this case, the result was relatively harmless to the rest of the post, but in situations of cutting and pasting or rearranging content, it’s anyone’s guess what might happen to the block comments, or how Gutenberg would handle attempting to recover from such a corrupted state.

This illustrates a key problem with using comments as the block delineation. They aren’t rendered in WYSIWYG editors, so their behavior can’t be predicted by users. This was actually touted as a feature in favor of comments by the Gutenberg team. For the life of me, I can’t figure out why.

Test 3 – Open the Gutenberg Post in TinyMCE and Change an Image Size

So clearly, we can’t edit Gutenberg posts in both third-party tools (or the traditional editor) and Gutenberg interchangeably, but I know we were told that Gutenberg posts would work in the traditional editor, if we chose to downgrade. So lets do that. We’re gonna try updating the size of that image in TinyMCE.

Not only does TinyMCE not know about the alignment or caption set on the image, it also has no idea that the image is from the media library, and so doesn’t offer any of the normal thumbnail sizes or options to change image. If I had more than a couple of images to modify, it would be extremely annoying to have to completely replace them, just to change a size setting.

Test 4 – Edit a Post made in TinyMCE in Gutenberg

Despite all the annoyances and frustrations of all the other use cases, I don’t use third-party editors very often, and most people probably won’t want to downgrade Gutenberg, so the issues with third-party and back compatibility will probably only ruin a few lives. Let’s instead focus on the main use case: the moment of upgrade! This is the wow moment when someone installs WordPress 5.0 on their site, and gets to open all their existing posts and supercharge them with Blocks. This is the moment they can take their existing posts and make an image full-width, or insert a new block-quote or text column half-way through. Many sites have thousands of old posts that could use the polish and control offered by Gutenberg… In short, this experience should blow my socks off!

Wait, let me make sure I get this straight… existing posts are going to be loaded into a “Classic Block”, that doesn’t get access to inserting any form of block at all? This means that to take advantage of one of the new block features, I need to go back through my posts, one at a time, converting them into Gutenberg posts, manually.

Imagine having to break up a post this long, just to add a full-width image. Now, imagine having to do it to thousands of posts. This is a terrible way to introduce Gutenberg to the users.

Conclusions

Well, it’s clear that back-compatibility is a bit of a misleading feature in Gutenberg, and forwards compatibility involves some sort of manual conversion step. This doesn’t seem like a very good solution for those with a lot of old content, and therefore is a pretty weak argument for building the WP Post Grammar format.

ADVERTISEMENT:

Let me present an alternative process, based in JSON. WordPress 5.0 could add a field to the wp_posts table, called ‘content_format‘ (a trivial change, using the db_delta function). The default format would be the TinyMCE we all know and, well… know. But different formats could be set and used instead. This hypothetical Gutenberg could start users off in the traditional editor for their existing posts, with a big old “update” banner, that would automatically convert the post to the new JSON format. They could then preview the change in editor, and it would only be converted in database on save. Users who post via third-party apps or incompatible plugins that write posts would have no trouble, as their implementation wouldn’t set the post_format flag, so the default TinyMCE implementation would would be used.

In this alternative version, you could expose PHP functions to parse one format to the other, and vice versa, using a PEG-parsed shortcode grammar and wpautop paragraphs (not comments). That way, plugins and third-party editors would see a format that they already know how to handle, and that doesn’t hide the delineations between blocks from users, but could easily save to the new JSON format with very minimal changes to their codebase. Furthermore, since the functions could validate the resulting JSON, we could provide linting for third-party tools, to make sure they don’t save invalid code. Of course, this would just be a legacy solution to allow these tools to take the time they need to implement the full JSON format, themselves, giving them feature parity with the Gutenberg editor.

I fail to see how this hypothetical version of Gutenberg would be any less back-compatible than what we have now… but it would offer all the other advantages of a real JSON storage structure.

Acknowledging and Resolving Technical Debt

The existing post_content format was originally created to post text, exactly as written, which is why enters are converted to paragraphs on save. Later, basic tools for bolding and italics were added, muddying the waters by mixing HTML support with wpautop. Then special <!--page--> and <!--more--> comments were added, and co-opted to perform tasks in WordPress when rendering. Then, image support was added, creating issues of linking to other content in WordPress with a hardcoded URL. Next came the visual editor, layering a complicated transformation from and to actual HTML from wpautop, Then we got shortcodes, parsed with Regular Expressions in HTML. Then capital_P_dangit, which for some reason performs a single spellcheck function in PHP, rather than as an actual spelling-suggestion. Then RICG, which filters the already hacky image tags with another Regular Expression. And somewhere in there, functions like wp_kses were added for security.

The overall pile of incompatible, barely functional technologies involved in post manipulation is extremely complicated to navigate for developers, or even just for users who want to be able to use HTML in their posts. A rehash of the editor needs to start by recognizing that this mess needs to be fixed, and unfortunately, Gutenberg doesn’t. Instead, it simply adds conditionals into the mix, where sometimes developers will need to deal with wpautop, and sometimes they won’t, or sometimes images will need to be saved/parsed in one format, and sometimes in another. It also adds an entirely new hack of a language into the mix that still contains shortcodes and un-prefaced comments.

It’s like taking a house made out of sand, and adding a second story. It might hold or it might not, but the underlying structure is left far more unstable.

Addendum – Learn from 4.8.2

While I was editing this post, WordPress released version 4.8.2, which included a breaking change to an undocumented functionality of the $wpdb->prepare() function that broke several extremely popular plugins, including Yoast SEO, and affected thousands of sites. Fundamentally, 4.8.2’s necessity was a direct result of a failure to adopt industry standards that was never corrected, and even with 4.8.2 remains an issue that will certainly rear its head again.

ADVERTISEMENT:

$wpdb->prepare() was introduced ten years ago today, as a response to theme and plugin authors’ inability to properly escape data being entered into mySQL queries. At the time, WordPress was getting pressure to support bound and prepared queries, a feature that makes SQL injection practically impossible, and which has been supported by the mySQLi and PDO database libraries since PHP 5.3. Instead of dropping support for PHP 5.2, or even introducing functions that were compatible with bound and prepared queries when available and that fell back to less secure options when not available, WordPress introduced $wpdb->prepare(). Prepare is simply a wrapper around mysql_real_escape_string(), with a fallback to add_quotes(), and an extra query to try to filter out multi-byte attacks.  It isn’t compatible with the format necessary to bind and execute in PDO or mySQLi, so it cannot provide the actual benefits of those libraries, instead choosing to be simultaneously compatible with two different string formatting functions, creating another security risk that was identified in 4.8.2.

As it stands, $wpdb->prepare() is not provably secure or stable, and never will be, because it wasn’t built following industry standards, but rather with a hacky roll-your-own solution. At the same time, the level of adoption of the secure PDO or MySQLi Bind->Prepare->Execute functionality is almost universal with other platforms.

Gutenberg’s WP Post Grammar format is another such tipping point. The world is embracing JSON structure. Ghost is already using it, and Medium likely is too. It is well beyond time for WordPress to be using and pushing the adoption of universal data standards, not rolling their own monstrosity.

6 Comments

  1. William Patton's profile image.

    Hey,

    This kind of storage system is one of my primary concerns with Gutenberg. I mentioned in an early review of Gutenberg that I tried to write posts in text editor mode and found it far too complex of a structure to write fluidly. I still use Gutenberg every week and the issues of switching back and forth between visual and text still exists (but block validation and fallback has somewhat eased this… a teeny little bit…).

    I passively mentioned the notion of moving to a JSON based model in that review although I had not put in a great deal of thought about it. You have explored the idea a lot more than I – and I see how it solves many of my concerns. I do think that unforseen pitfalls will present themselves in any chosen system so it’s not fair of me to say that a switch like this would significantly simplify writing posts or be a magic fix for my other gripes though. It sounds closer to what I would want though.

    I’m of the opinion that Gutenberg does blocks wrong. The syntax feels cluttered and overly verbose. I believe an approach of splitting the content from it’s settings and storing those in JSON would be a better choice. I worry that, at this stage of development, changes like this will be left to sit too long untill making such a change becomes completely unviable 🙁

    • gschoppe's profile image.

      Now that React 16 has been re-licensed under the MIT license, I think Gutenberg has an excellent opportunity to fix the storage method by using the time that was earmarked for the switch to redesign their storage method.

  2. Gary's profile image.

    Thanks for writing this great testing post, Greg!

    You cover a lot of ground in the post, I’ll try to clarify everything, but please let me know if there’s something I missed. 🙂

    Track Record

    WP Post Grammar is new, and while it is a formal grammar, it hasn’t undergone real-world testing beyond what is supported by the currently limited API.

    Do you have some examples of where the grammar can be broken? Certainly no-one is claiming it’s complete, but as I’m sure you’re aware, part of the point of a formal grammar is that it can be defined very simply, yet allow for very complex structures. I would be significantly more concerned if the grammar were already super complex.

    As soon as Gutenberg is released, people will start abusing it in ways the core team hasn’t expected.

    I dearly hope that people do abuse it! 😁 If Gutenberg is being pushed beyond whatever has currently been considered, that’s a good indicator that people are putting a lot of thought and effort into seeing what they can create with Gutenberg.

    With bespoke systems like wpautop and shortcodes, this kind of abuse lead to exposing serious flaws in the design, so much so that core has been vocal about their disdain of shortcodes as a syntax for years.

    You’re kind of conflating three very different situations here.

    wpautop is a relatively simple bit of code that has a somewhat unearned reputation of constantly breaking. There’s no denying that it struggles with some edge cases, but it just works for 99.999% of usage.

    Shortcodes are a complex, but exceedingly powerful system, which have served WordPress well for many years. If anything, blocks are simply a more user friendly evolution of shortcodes, and I fully expect the WordPress world to come up with a myriad of interesting uses for blocks, in much the same way they did for shortcodes.

    As one of the handful of core people who work on wpautop and shortcodes, I’m aware of their weaknesses, but please don’t confuse that for disdain. They’re useful tools Gutenberg will learn from and build upon.

    The WP Post Grammar, as you mentioned, is a formal grammar, which addresses many of wpautop’s and shortcode’s shortcomings. With WP Post Grammar as a precedent, I think it’s quite reasonable for us to implement a shortcode grammar in the future, however.

    Compatibility & Universality

    By using a post grammar that is based in plain HTML, this encourages third party tools to write raw, unstructured data into a structure with a formal grammar.

    This is a very important consideration, and one that we deliberately want to allow. It’s not just third party tools, however, it’s the thousands of plugins, and countless snippets of code across the web that interface directly with post_content, and expect it to contain HTML. Breaking backwards compatibility with them causes needless pain for millions of site owners.

    Furthermore, with custom blocks fully defined in JS, and storing to a mix of HTML and Comments, for a third-party editor to present the user with blocks, they would have to blindly re-implement blocks, not knowing what custom blocks are supported by the platform.

    There’s some interesting work on Block manifests, which would be available to any third party editor. A third party editor could easily generate UI based on the manifest, and have custom UIs for the blocks they want to give special treatment. This is no different to any other solution that allows for the definition of arbitrary blocks.

    Structure vs Presentation

    in 4.4, source-set attributes were added to the img tags. Since they were already stored in the post_content of a million posts as flat HTML, a regex had to be used for this, which has known issues.

    You’ve brought up srcset as an example several times, but that’s actually a great example of using regular expressions to parse a limited subset of HTML in a predictable way – it limits the HTML being operated on to a known subset, and methodically breaks down the img tag, to ensure the correct data is being retrieved. Regular expressions are a wonderful tool when used correctly, and this is an excellent example of breaking an irregular language down into regular parts, so they can be processed accurately.

    The WP Post Grammar, on the other hand, mixes structure and presentation. Let’s look at an image, stored in the Gutenberg data structure

    This is totally intentional. A client that knows nothing about Gutenberg can take that HTML and render it without modifying anything. This is critical for all sorts of third party clients to not need to update for Gutenberg support.

    However, that doesn’t need to be the case. A block can also store the relevant data to re-create the HTML within the block comment. The image block example you gave currently doesn’t do that, but that’s easy enough to change. This gives all the advantages of the JSON-based structures you’ve mentioned, but with 100% backwards compatibility for presentation.

    If it helps, you can think of the HTML within the block less of a data source, and more of a cache of the most common presentation format.

    One of the clear places to see this issue is with columns. An issue was opened about columns back on March 9th that shows the issue perfectly. The conversation quickly devolved into which of the many competing column standards WordPress should officially support. The obvious answer: Neither!

    I strongly disagree, WordPress should absolutely officially support a column standard. Leaving this up to themes puts unnecessary strain on theme developers, and will inevitably result in hundreds of different implementations of the same thing.

    Of course, if a theme wanted to used a different column standard, it can ignore the HTML that Gutenberg has cached for it within the block, and output its own HTML based on the block data.

    Common Point of Entry

    The biggest reason for the Gutenberg team to build their comment-based structure is ostensibly to maintain the ability for posts to be edited as plain text, whether by the WordPress editor’s text mode or by third-party editors.

    I don’t know that it’s the biggest, but it’s certainly up there. 😉

    The issue this creates is that while Gutenberg may create a formal post grammar, the text editor (and any third-party editor) is not bound to this grammar. So, posts may lose data in the transition from HTML+Comments to Gutenberg and back, if the post is edited in text mode.

    That’s absolutely a risk to consider, and something we’ll likely need to do some extra work on to ensure backwards compatibility. Much like with meta boxes, this is a problem to tackle later in the process, rather than earlier – it’s important to have the grammar locked down before we get too into the nitty-gritty of converting data back and forth.

    Back Compatibility

    So, after all that, we can get to the fun part of this post – the testing!

    First up, I agree that the Gutenberg plain text version is clearly more verbose than the TinyMCE version. Whether or not that’s a bad thing is up for interpretation, but I don’t expect people to be handwriting Gutenberg blocks. The text view is to let them make tweaks when the WYSIWYG editor gets things wrong. It’s also an important funnel for onboarding new contributors to WordPress – many contributors got their start by switching to the text view, or editing a plugin.

    Secondly, and I hate to labour a point I’m sure you’re aware of, but these tests would really be better on the Gutenberg issue tracker, where they can be properly triaged. If you’d like some help, I’m more than happy to work with you to move them across. 🙂

    Test 1

    Hey, that’s fun! It’s awesome to see that TinyMCE can read and save the post in a format that Gutenberg recognises, it just needs a bit of tweaking to allow for minor changes. Gutenberg is currently very cautious about unexpected changes to the HTML, but we can definitely become more flexible as the grammar solidifies.

    Test 2

    Oh, that’s pretty cool. 😂 It seems like we could add a bit of sanity checking to remove empty paragraph blocks, and to re-split multiple paragraphs in a single blocks.

    Test 3

    That’s an interesting problem, there’s a ticket for Gutenberg to handle the caption shortcode, we could definitely explore whether it’s viable to use a format that’s Gutenberg and TinyMCE friendly.

    Test 4

    You’re right that this particular case isn’t ideal, but a lot of the work to make an existing post convert to Gutenberg blocks is already done! The paste handlers are where the bulk of this work is being done, which gives a lot more interesting cases – instead of just handling old posts, it can handle data from Google Docs, Word, Markdown, or any other source you’d like to add support for. Wiring up converting old posts from the database is a relatively simple task that can be left until later.

    In many ways, this approach is an extension of the early discussions on the data format – data portability was always the highest priority, we can connect up all of the possible sources later on.

    Conclusions

    Unfortunately, I can’t agree with your conclusions.

    Backwards compatibility has always been a critical part of WordPress’ success, to drop that from the actual content, the posts, can’t be something that’s done on a whim, and using JSON instead of a data format that works with existing data is definitely a whim. It seems like most of our disagreement is over what sort of delimiter we should be using when concatenating strings. 😉

    One of WordPress’ strengths is that we evolve the user experience first, and the tech will follow as it needs to. Once we have a v1, we can evolve that into a v2, and so on. Gutenberg is no different, the focus is on making something that evolves that user experience. What the user sees, whether it be in the editor or on their site, is the thing that matters – how we arrange the strings in the background doesn’t really have that great a bearing on the outcome.

    On a more personal note, I’d love for you to take some time to put aside replacing the WP Post grammar, and dive into it , really trying to make it work. An important (if informal) principle of how WordPress Core is built is the idea of “Disagree and Commit“. In short, we can vehemently disagree on the best way to go about something, but once a decision is made (in this case, Matías as the Gutenberg tech lead makes that decision), we all commit to making it work, even if it’s the opposite of what we think the right decision was. I’ve been on both sides of really big decisions, and I know how rough it can be when you’re certain that it’s all wrong, but I promise it’s worth the effort to look past that to what’s next.

    This way, we focus can remain on building the best editing experience we can, which is what Gutenberg is all about. 🙂

    • gschoppe's profile image.

      Do you have some examples of where the grammar can be broken? Certainly no-one is claiming it’s complete

      I mention a few potential ways to break things later in my post, with text-mode editing, but the primary issue isn’t what issues I can identify immediately. It’s that a bespoke system can contain systemic issues for the future that are not present in a storage format that has been put through the ringer by millions of sites, such as JSON.

      but it just works for 99.999% of usage.

      Respectfully, it doesnt. wpautop fails in any situation where the user adds block level elements into the editor, because you can’t separate them with whitespace without inserting a paragraph, br, or nbsp, but the naive way the excerpt strips tags means that leaving them next to each other causes words to run together in the excerpt. This may be in that 0.001% for you, but if so, it’s only because WordPress doesn’t support it, so the userbase has stopped trying.

      I think it’s quite reasonable for us to implement a shortcode grammar in the future, however.

      If that is the case, implement a shortcode grammar instead of a comment grammar. as a formal grammar it would be equally trustworthy, but would use a format wordpress editors already know how to speak. I understand the argument that comments are invisible, but that is a BAD thing for any editor that doesn’t support blocks right out of the box. they will have invisible delineations that they don’t know how to handle, instead of visible delineations that fit a standard they already know about. It’s not a good solution, but it’s better than adding a completely new format

      expect it to contain HTML. Breaking backwards compatibility with them causes needless pain for millions of site owners.

      Yes, I understand this is a point where we completely disagree ideologically. I believe in mitigating that pain to produce a stronger foundation, rather than trying to cause no pain, which i believe to be an impossibility. But when you use words like “needless” or “whim” to describe using an industry standard, even when you have just read a massive diatribe showing all the myriad benefits, it doesn’t come across as listening. To be honest, this is part of why a large chunk of the community feels unheard right now. I made a lot of points in this post, and using words like “needless” or “whim” just act to minimize those points and turn this less into a discussion than an explanation. I understand why the decision was made, but that doesn’t necessarily make the decision a good one. At the same time that I am putting in the effort to understand why one option was chosen, these responses totally ignore the benefits posed by the alternative. It leaves the community members who raise these concerns feeling unheard.

      but that’s actually a great example of using regular expressions to parse a limited subset of HTML in a predictable way

      Except, because it doesn’t actually know the meaning of the IMG tag or any meaningful data about what it references, it makes naive assumptions. Here is one example of how it can fail in a rather spectacular way (keep in mind, this is an example, not a canonical list of all bugs):

      1. A user places an image on a page, using the visual editor.
      2. Later, a second user, who prefers text mode, replaces the src attribute for that image, to instead link to an externally hosted image with the same filename (not all that uncommon if the image is titled ‘screenshot.jpg’ or ‘image.jpg’ or ‘download.jpg’ or one of the many other non-descriptive filenames people commonly use)
      3. Because the replace function sees the wp-image-## class on the image tag, it adds source sets for the image with that attachment ID. It checks the base name of the image src, but because basename matches, it assumes the image is the same.
      4. The user who previews their edits, depending on their browser or screen size, may see no error whatsoever, and if they look at the image tag in the editor, it is correctly formatted HTML
      5. Viewers, depending on browser and screen size, will be presented with the wrong image when viewing the page, and debugging will be totally non-intuitive to the user who created the content, or to the original user,
        as the correct image will render in the editor for each of them.

      WordPress should absolutely officially support a column standard. Leaving this up to themes puts unnecessary strain on theme developers, and will inevitably result in hundreds of different implementations of the same thing.

      I vehemently disagree. WordPress could provide a fallback implementation, but it should absolutely not push one presentation over another by linking it with the data structure, because it locks them into a choice that may swiftly become dated. And then, that choice becomes hard to abandon, because of back-compatibility. In a JSON structure, there is no cost whatsoever to changing out the method Gutenberg uses to render the columns in either the front end or the back end. In the muddled structure/presentation format we have now, either the data structure has to change, or all pages need to do an additional double parse to fix the original presentation, making sites that adopt modern standards perform slower than sites that use legacy structures.

      it’s important to have the grammar locked down before we get too into the nitty-gritty of converting data back and forth.

      I don’t really understand how this is ‘nitty-gritty’ and not a high level decision… the text editor doesn’t support structured editing, and even if it did, third party editors wouldn’t. So presenting structured content in editor provides no protection for the structure.

      So, after all that, we can get to the fun part of this post – the testing!

      Do you not see how this is just a teensy bit dismissive?

      these tests would really be better on the Gutenberg issue tracker, where they can be properly triaged

      I believe you are misunderstanding the nature of these issues. I am not listing a finite set of bugs that can be dealt with and everything will be fine. I’m listing examples of the infinite number of bugs created by trying to support both the existing format and a new structured format simultaneously. Posting these to the issue tracker and seeing them resolved in a spot manner just leads to submitting more and more of that infinite set, and producing more and more convoluted spot-fixes for what is fundamentally an issue of approach.

      we can definitely become more flexible as the grammar solidifies.

      You do realize that Matt slated the Merge for the end of this year (before the react fiasco hit the fan), right? Late September isn’t really the time for grammar to still be unsolidified.

      Oh, that’s pretty cool. 😂 It seems like we could add a bit of sanity checking to remove empty paragraph blocks, and to re-split multiple paragraphs in a single blocks.

      I think you might have missed the implications of this test. this particular system was a minimal change to show the smallest version of the issue. cutting and pasting sections of content around a post in tinyMCE is certain to yield much weirder and less recoverable issues.

      to make an existing post convert to Gutenberg blocks

      If a conversion is necessary, then you aren’t really supporting those third party tools that write directly to the post_content, because the results will need conversion… So why can’t that conversion create a JSON-serialized object, rather than a custom format? you don’t even need to store it in post_content.. you could put it in a field called post_structure and just use post_content as a legacy fallback.

      using JSON instead of a data format that works with existing data is definitely a whim

      It is a shame that you see adopting industry standards as a whim, but it clearly isn’t, and I’ve outlined several reasons in this post.

      What the user sees, whether it be in the editor or on their site, is the thing that matters

      Can I please quote you on that, because to me, that is the most damning statement against the development processes in place that I’ve ever heard! WordPress is supposedly the “Operating System of the Web”. To be that effectively, the underlying technology cannot be an afterthought.

      Disagree and Commit

      Once Gutenberg is a part of core, I will have to assess whether WordPress remains a good platform for my clients. I have been pouring over the docs, and trying to fit their needs into the structures in place, but it is very hard to even imagine, with the skeleton as sparse as it currently is, and the proposed merge in three dev months. In the short-term, after the 5.0 launch, I will absolutely have to strip Gutenberg from all my installs, until they can be tested and modified to ensure compatibility. That’s an obligation I have to my clients. However, it is crucial to note that Gutenberg is not core yet, and if there are systemic, pervasive problems with it, it is my obligation to bring them forward as publicly and loudly as possible. When it comes right down to it, none of these decisions should be considered set in stone until the merge request is accepted.

      I’ve been on both sides of really big decisions, and I know how rough it can be when you’re certain that it’s all wrong, but I promise it’s worth the effort to look past that to what’s next.

      This is a core structural decision that will be set in stone with Gutenberg’s release, and will plague WordPress users for the foreseeable future. What’s next is hair-pulling and an exodus of devs, if this isn’t addressed.

  3. Josh's profile image.

    Thanks for stating what seems to me to be so painfully clear. I cannot believe that things have taken the road they have. I sincerely believe that if Gutenberg goes forward in its current state, or anything close to it, they (WP core devs) are either going to have to spend two years getting back to where they were (2018 spent trying to pretend things are great and justifying their horror, 2019 celebrating their own return to sanity), or worse, that this will be the Bane that breaks Batman’s back, and WP will never get the chance to slowly morph itself into a spry modern CMS, but will be remembered as a horrific mess of hooked spaghetti code and auto-inserted emoji JS.

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>