Extendible BBCode Parser in JavaScript

Photo By Dean Terry

I decided to try my hand at implementing a BBCode parser in JavaScript. You can play around with it online here, and download the source here.

I had looked around a little bit and noticed that the existing JavaScript BBCode parsers had at least a few of the following issues:

  • They didn’t report errors on misaligned tags (e.g., [b][u]test[/b][/u]).
  • They couldn’t handle tags of the same type that were nested within each other (e.g., [color=red]red[color=blue]blue[/color]red again[/color]). This happens because their regex will look for the first closing tag it can find.
  • They couldn’t handle BBCode’s list format (e.g., [list][*]item 1[*]item 2[/list]).
  • They didn’t report errors on incorrect parent-child relationships (e.g., [list][td]item 1?[/td][/list]).
  • They weren’t easily extendible.

I naively thought it’d be easy to quickly whip up a parser, and at first it was. Most BBCode tags can be implemented with a simple find and replace. However, I quickly ran into the issues of dealing with nested tags of the same type, the noparse tag, and the list tag’s annoying [*] tag (which doesn’t have a closing tag). Luckily, I came across a neat blog post on finding nested patterns in JavaScript, which came in handy for isolating tag pairs, from the inner-most on up. Taking the idea from that post, one can do something like this to process the inner tags first and avoid the nested tag problem:

var str = "[list][list]test[/list][/list]",
    re = /\[([^\]]*?)\](.*?)\[\/\1\]/gi;
while (str !== (str = str.replace(re, function(strMatch, subM1, subM2) {
    return "" + subM2 + "";
})));
// str = "test"

That idea works well, though you can’t implement a noparse tag if you process the inner-most tags first. So I decided to pre-process the BBCode with something similar to the idea above and add in nested-depth information to each open and close tag. Once all of the tags had that, I could parse the processed code with a regex that could easily match-up the correct open and close tags.

To get around the issue of the [*] tag having no closing tag, I wrote code that inserted [/*] tags where they were supposed to go during the pre-processing period. I wont go into the algorithm here, but you can dig into the code if you’re interested.

Also, I should note that the fact that JavaScript allows you to use a function as the second parameter to the replace method makes processing the tags really easy. Once you match a set of tags, you can recursively call the parse function on that tag’s contents from inside of the function you passed to replace.

Using the parser

To use the use the parser, you’d simply include xbbcode.js and xbbcode.css files somewhere on your page (which are contained in the zip file linked above), and then call the XBBCODE object from somewhere in your JavaScript:

var result = XBBCODE.process({
    text: "Some bbcode to process here",
    removeMisalignedTags: false,
    addInLineBreaks: false
});
console.log("Errors: " + result.error);
console.dir(result.errorQueue);
console.log(result.html);// the HTML form of your BBCode

Adding new tags

To add a new tag to your BBCode, add properties to the “tags” object inside of the XBBCODE object. For example, say you wanted to add a tag called [googleit] which would change its contents into a link of its google search results. You’d implement that by adding this to the tags object:

"googleit": {
    openTag: function(params,content) {
        var website = "\"http://www.google.com/#q=" + content + '"';
        return '<a href=' + website + '>';
    },
    closeTag: function(params,content) {
        return '</a>';
    }
}

Then you could have BBCode like this: “[googleit]ta-da![/googleit]” which would be transformed into this: “<a href=”http://www.google.com/#q=ta-da!”>ta-da!</a>”

If you have any suggestions or find any bugs let me know.

45 thoughts on “Extendible BBCode Parser in JavaScript”

  1. Hi!

    I tried your BBCode Previewer. Please change this (in the end):

    ret.html = ret.html.replace(“[”, “[“); // put ['s back in
    ret.html = ret.html.replace("]", "]“); // put [‘s back in

    to this:

    ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put ['s back in
    ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put [‘s back in

    Your original version sometimes causes errors.

    Some bbcodes (for example [b] and [u]) don’t have to be the child of another tag pair. Browsers can interpret this HTML code: “He<b>ll<u>o Wor</b>ld!</u>”

  2. Hi balping,

    Thank you for the feedback! I did a little poking around, but I’m not sure how the first part sometimes leads to errors. Can you give me an example?

    As for browsers parsing misaligned tags, I decided it was best to just restrict users to using the correct syntax. HTML parsers will actually try to guess what you meant if you have misaligned tags like the example you presented. You can try it out by running this code in Firebug (or in its own webpage):

    var test = document.createElement(“div”);
    test.innerHTML = “He<b>ll<u>o Wor</b>ld!</u>”;
    console.dir( test );

    If you look at the childNodes property, then look at the childNodes property of the “b” tag, you’ll see it has a “u” element inside of it. The HTML parser basically interpreted that string like this:

    He<b>ll<u>o Wor</u></b><u>ld!</u>

    On the surface this seems like it might be easily to implement, but it can become pretty nasty pretty quickly.

  3. I interpret text in two steps:

    1. I have base64 encoded data between [todecode][/todecode] tags. So in the first step I decode the data and I do nothing what outside these tags is.
    2. I interpret the hole decoded text.

    I don’t exactly know why, but if I use
    ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put ['s back in
    ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put [‘s back in
    the code works, if I don’t, it creates “[todecode<” tags.
    Your original code replaces only the first match but my one replaces all of them.

  4. This is a useful tool… but why do you set “me = this” at the very beginning? That essentially sets “me = window”, which is a WHOLE LOT of useless overhead in your case as all you want to return is an object with the process() method.

    Just a thought…

  5. Art – I just reviewed the code and you’re right, that is kind of useless. What I meant to do was write:

    me = {};

    That way the me object would be enclosed in the anonymous function yet still have access to the variables defined within. I think I changed from using the module pattern mid-way through development and forgot to update the me variable. Tonight I’ll update this, and then test and review to make sure everything’s in order. Thank you for letting me know!

  6. Should be identical to code. Only difference is some systems will do syntax highlighting but that doesn’t matter. If I have simple stuff between the php tags it works but if I have a lot of text in there I get misaligned tags warning from the previewer. This string I have has many [] (valid for JavaScript) in the string and tags and stuff. What would you expect the output of this:

    [CODE][][/CODE]

    That throws an error

  7. Try:

    [code][][/code]

    It looks like there’s a bug when using upper case letters for tag names. I’m going to see if I can figure out what’s happening real quick.

  8. I just updated http://patorjk.com/bbcode-previewer/ and I think the upper-case issue is fixed. I updated 2 lines:

    409: childTag = (matchingTags[ii].match(reTagNamesParts))[2].toLowerCase();
    461: tagName = tagName.toLowerCase(); // line added

    If you want to send me the sample PHP code I could help you trouble shoot the long code issue.

  9. It would be great if you could put this on GitHub. That way people could patch their own branches and send you pull requests.

  10. Thank you! Really like the script.

    Have you noticed the double line spacing inside of [code] when addInLineBreaks equals 'true'? Perhaps I'm not thinking about it correctly, but I'd like to see anything inside the [code] block treated at pre-formatted, and not have tags inserted.

        1. Referring to Bernard’s post, I believe he’s saying if you leave off the http://, the result will be blank. For instance, [img]web.scott.k12.va.us/martha2/dmbtest.gif[/img] or
          [img]www.web.scott.k12.va.us/martha2/dmbtest.gif[/img]

  11. Thank you for making this code available, Patrick. I imagine there is a good reason why you have used styled spans in your HTML conversions as opposed to using tags such as <b> etc. However, it is worth nothing that this can have a negative impact on SEO if the converted HTML is to be seen by Google & co.

    For some reason you appear to have missed out on providing handlers for a few common BBCodes

    [font=fontName]text[font]


    "font":{openTag:function(params,content)
    {
    params = params.substr(1,params.length - 1);
    params = '"' + params + '"';
    return "";
    },
    closeTag:function(params,content){return '';},
    displayContent:true},

    [sub]text[/sub] and [sup]text[/sup]

    },
    "sub": {
    openTag: function(params,content) {
    return '';
    },
    closeTag: function(params,content) {
    return '';
    }
    },
    "sup": {
    openTag: function(params,content) {
    return '';
    },
    closeTag: function(params,content) {
    return '';
    }
    }

    Finally, here is a slight mod to allow for the creation of anchors that can have target=’_blank’ and/or rel=’nofollow’. param should bear the form url|blank|nofollow where blank and nofollow can be 0 (false) or 1 (true).


    "url": {
    openTag: function(params,content) {

    params = params.substring(1).split('|');
    var blank = (1 == parseInt(params[1]))?" target='_blank'":"";
    var nofollow = (1 == parseInt(params[2]))?" rel='nofollow'":'';
    urlPattern.lastIndex = 0;
    return '';
    },
    closeTag: function(params,content) {
    return '
    ';
    }
    },

    1. Thanks for the suggestions! I’ve gone ahead and added sup and sub to the github repo. Your font tag implementation could possibly have a security hole in it since you’re trusting that the user isn’t passing along any nefarious parameters (ex: adding an onclick field – not sure if that’d work for the font tag, but you have to be careful).

      Spans were used for some of the tags since you can’t be sure if someone hasn’t overridden the styling for tags like code (some CSS frameworks do this).

  12. I am using your code along with my own take on BBCode. Amongst other things I am also allowing for text alignment in paragraphs. Here is the code for that


    "para": {
    openTag: function(params,content)
    {
    params = parseInt(params.reverse());
    var al = 'left';
    switch (params)
    {
    case 1:al = 'right';break;
    case 2:al = 'center';break;
    case 3:al='justify';break;
    }
    return '<p style="text-align:' + al + '"/>';
    },
    closeTag: function(params,content) {
    return '</>';
    }
    },

  13. Hey, I think your website might be having browser compatibility issues.
    When I look at your blog in Firefox, it looks fine but when opening in Internet Explorer, it has
    some overlapping. I just wwnted to give you a quick hears
    up! Other then that, very good blog!

  14. First of all: Thanks for your nice job, it works quite well!

    FYI: I needed to update the urlPattern to support semicolons as well. Have there been any reasons why it’s not supported in the current version?

    urlPattern = /^(?:https?|file|c):(?:\/{1,3}|\\{1})[-a-zA-Z0-9:;@#%&()~_?\+=\/\\\.]*$/,

  15. Is there a way to get rid of the unneeded line breaks (inside lists or tables)?
    [table][tr][td]1[/td]
    [td]2[/td]
    [/tr]
    [tr][td]3[/td]
    [td]4[/td]
    [/tr]
    [/table]

    This will output as
    \n\n\n\n\n\ntable...

      1. Yes, I know I can edit the code, thought someone has done this already. I haven’t found a place where to do the triming.

      2. Do you know of a good way to handle smilies? In the instance where I’m using this, there are an absurd amount of smilies, 151. Obviously creating a tag for every single one isn’t the best option. Any idea?

  16. I was able to get it working, but I wanted to allow the possibility of html tags, since html escaping for my script is done server-side.
    Removing your initial escaping of html tags caused issues because of the parser’s logic:
    change all []’s to angle brackets temporarily
    find valid tags and return them to []’s
    change all leftover angle brackets to escape chars

    I struggled with this for a long time but put together a regex that skips these steps by escaping all unsupported bbcodes:
    this.nonTags = new RegExp("(\\[)((?!/?" + this.tagList.join("(?:\\]|\\b|=)|/?") + "(?:\\]|\\b|=)).*?)(\\])");
    Then I can replace lines 720 to 730 with this:

    // Capture all bbcode tags that are not part of the tag list, and escape them
    config.text = config.text.replace(nonTags, function(matchStr, openB, contents, closeB) {
    return '[' + contents + ']';
    });

    Works for me so far.

  17. I pretty much love this! It’s so easy to use and to add to and modify!
    Though I’m curious for the color names, why you’re matching specific names, instead of just a whole word \w+
    Also if you want to add matching for RGB and RGBA
    colorRgbPattern = /(^rgb\((\d+),\s*(\d+),\s*(\d+)\)$)|(^rgba\((\d+),\s*(\d+),\s*(\d+)(,\s*\d+\.\d+)*\)$)/
    Then just update the if-statement.

    1. It’s been a long time since I’ve looked at the code. I think I was just trying to model it after some BBCode implementations I had seen on message boards.

      If you’ve got something working for an rgb/rgba setup, you can send me a pull request on github and I’ll check it out.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>