Extendible BBCode Parser in JavaScript

Photo By Dean Terry

I decided to try my hand at implementing a BBCode parser in JavaScript. You can play around with it online here, and download the source here.

I had looked around a little bit and noticed that the existing JavaScript BBCode parsers had at least a few of the following issues:

  • They didn’t report errors on misaligned tags (e.g., [b][u]test[/b][/u]).
  • They couldn’t handle tags of the same type that were nested within each other (e.g., [color=red]red[color=blue]blue[/color]red again[/color]). This happens because their regex will look for the first closing tag it can find.
  • They couldn’t handle BBCode’s list format (e.g., [list][*]item 1[*]item 2[/list]).
  • They didn’t report errors on incorrect parent-child relationships (e.g., [list][td]item 1?[/td][/list]).
  • They weren’t easily extendible.

I naively thought it’d be easy to quickly whip up a parser, and at first it was. Most BBCode tags can be implemented with a simple find and replace. However, I quickly ran into the issues of dealing with nested tags of the same type, the noparse tag, and the list tag’s annoying [*] tag (which doesn’t have a closing tag). Luckily, I came across a neat blog post on finding nested patterns in JavaScript, which came in handy for isolating tag pairs, from the inner-most on up. Taking the idea from that post, one can do something like this to process the inner tags first and avoid the nested tag problem:

var str = "[list][list]test[/list][/list]",
    re = /\[([^\]]*?)\](.*?)\[\/\1\]/gi;
while (str !== (str = str.replace(re, function(strMatch, subM1, subM2) {
    return "" + subM2 + "";
})));
// str = "test"

That idea works well, though you can’t implement a noparse tag if you process the inner-most tags first. So I decided to pre-process the BBCode with something similar to the idea above and add in nested-depth information to each open and close tag. Once all of the tags had that, I could parse the processed code with a regex that could easily match-up the correct open and close tags.

To get around the issue of the [*] tag having no closing tag, I wrote code that inserted [/*] tags where they were supposed to go during the pre-processing period. I wont go into the algorithm here, but you can dig into the code if you’re interested.

Also, I should note that the fact that JavaScript allows you to use a function as the second parameter to the replace method makes processing the tags really easy. Once you match a set of tags, you can recursively call the parse function on that tag’s contents from inside of the function you passed to replace.

Using the parser

To use the use the parser, you’d simply include xbbcode.js and xbbcode.css files somewhere on your page (which are contained in the zip file linked above), and then call the XBBCODE object from somewhere in your JavaScript:

var result = XBBCODE.process({
    text: "Some bbcode to process here",
    removeMisalignedTags: false,
    addInLineBreaks: false
});
console.log("Errors: " + result.error);
console.dir(result.errorQueue);
console.log(result.html);// the HTML form of your BBCode

Adding new tags

To add a new tag to your BBCode, add properties to the “tags” object inside of the XBBCODE object. For example, say you wanted to add a tag called [googleit] which would change its contents into a link of its google search results. You’d implement that by adding this to the tags object:

"googleit": {
    openTag: function(params,content) {
        var website = "\"http://www.google.com/#q=" + content + '"';
        return '<a href=' + website + '>';
    },
    closeTag: function(params,content) {
        return '</a>';
    }
}

Then you could have BBCode like this: “[googleit]ta-da![/googleit]” which would be transformed into this: “<a href=”http://www.google.com/#q=ta-da!”>ta-da!</a>”

If you have any suggestions or find any bugs let me know.

This entry was posted in JavaScript, Web Apps, web development. Bookmark the permalink.

26 Responses to Extendible BBCode Parser in JavaScript

  1. jelbourn says:

    This is quite the excellent piece of code you have written. Kudos.

  2. balping says:

    Hi!

    I tried your BBCode Previewer. Please change this (in the end):

    ret.html = ret.html.replace(“[”, “["); // put ['s back in
    ret.html = ret.html.replace("]", "]“); // put ['s back in

    to this:

    ret.html = ret.html.replace(/\&\#91\;/gm, "["); // put ['s back in
    ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put ['s back in

    Your original version sometimes causes errors.

    Some bbcodes (for example [b] and [u]) don’t have to be the child of another tag pair. Browsers can interpret this HTML code: “He<b>ll<u>o Wor</b>ld!</u>”

  3. patorjk says:

    Hi balping,

    Thank you for the feedback! I did a little poking around, but I’m not sure how the first part sometimes leads to errors. Can you give me an example?

    As for browsers parsing misaligned tags, I decided it was best to just restrict users to using the correct syntax. HTML parsers will actually try to guess what you meant if you have misaligned tags like the example you presented. You can try it out by running this code in Firebug (or in its own webpage):

    var test = document.createElement(“div”);
    test.innerHTML = “He<b>ll<u>o Wor</b>ld!</u>”;
    console.dir( test );

    If you look at the childNodes property, then look at the childNodes property of the “b” tag, you’ll see it has a “u” element inside of it. The HTML parser basically interpreted that string like this:

    He<b>ll<u>o Wor</u></b><u>ld!</u>

    On the surface this seems like it might be easily to implement, but it can become pretty nasty pretty quickly.

  4. balping says:

    I interpret text in two steps:

    1. I have base64 encoded data between [todecode][/todecode] tags. So in the first step I decode the data and I do nothing what outside these tags is.
    2. I interpret the hole decoded text.

    I don’t exactly know why, but if I use
    ret.html = ret.html.replace(/\&\#91\;/gm, “["); // put ['s back in
    ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put [‘s back in
    the code works, if I don’t, it creates “[todecode<” tags.
    Your original code replaces only the first match but my one replaces all of them.

  5. balping says:

    Sorry, it crates “[todecode<]” tags…

  6. balping says:

    Ahh it crates “[todecode>]” tags

  7. Art says:

    This is a useful tool… but why do you set “me = this” at the very beginning? That essentially sets “me = window”, which is a WHOLE LOT of useless overhead in your case as all you want to return is an object with the process() method.

    Just a thought…

  8. patorjk says:

    Art – I just reviewed the code and you’re right, that is kind of useless. What I meant to do was write:

    me = {};

    That way the me object would be enclosed in the anonymous function yet still have access to the variables defined within. I think I changed from using the module pattern mid-way through development and forgot to update the me variable. Tonight I’ll update this, and then test and review to make sure everything’s in order. Thank you for letting me know!

  9. You may want to test the [PHP] codes. I tried to add it to the tags object but it didn’t seem the replace in the parse function picks it up.

  10. patorjk says:

    Mitchell – How does the PHP tag behave? I looked it up real quick and read that it was similar to the code tag, so I added it to the previewer and it seemed to work ok:

    http://patorjk.com/bbcode-previewer/

    Edit: As a site note, right now it’s use is just set to be [php]php code here[/php]

  11. Should be identical to code. Only difference is some systems will do syntax highlighting but that doesn’t matter. If I have simple stuff between the php tags it works but if I have a lot of text in there I get misaligned tags warning from the previewer. This string I have has many [] (valid for JavaScript) in the string and tags and stuff. What would you expect the output of this:

    [CODE][][/CODE]

    That throws an error

  12. patorjk says:

    Try:

    [code][][/code]

    It looks like there’s a bug when using upper case letters for tag names. I’m going to see if I can figure out what’s happening real quick.

  13. that would be correct… I did modify it to lower case the tagName.

    I still need to see why the long code in [PHP] is breaking it

  14. patorjk says:

    I just updated http://patorjk.com/bbcode-previewer/ and I think the upper-case issue is fixed. I updated 2 lines:

    409: childTag = (matchingTags[ii].match(reTagNamesParts))[2].toLowerCase();
    461: tagName = tagName.toLowerCase(); // line added

    If you want to send me the sample PHP code I could help you trouble shoot the long code issue.

  15. That would be great! Where would you like the code? it’s quite long for a comment IMO

  16. patorjk says:

    Send it to patorjk at gmail dot com.

  17. mrfishie says:

    Now that… that is awesome. Really helps, thanks! I have been looking for one of these in a while, and this is definitely what I need.

  18. patorjk says:

    mrfishie – Glad it could help :) .

  19. steve says:

    It would be great if you could put this on GitHub. That way people could patch their own branches and send you pull requests.

  20. Bernard says:

    Hi Patorjk

    first of all thank you for the code.
    I had tried using
    [img]img/something.png[/img]
    but it always returning

    please advise :)

  21. Daniel says:

    Thank you! Really like the script.

    Have you noticed the double line spacing inside of [code] when addInLineBreaks equals 'true'? Perhaps I'm not thinking about it correctly, but I'd like to see anything inside the [code] block treated at pre-formatted, and not have tags inserted.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>