Extendible BBCode Parser in JavaScript

May 7, 2011 patorjk 45 Comments

I decided to try my hand at implementing a BBCode parser in JavaScript. You can play around with it online here, and download the source here.

I had looked around a little bit and noticed that the existing JavaScript BBCode parsers had at least a few of the following issues:

They didn’t report errors on misaligned tags (e.g., [b][u]test[/b][/u]).
They couldn’t handle tags of the same type that were nested within each other (e.g., [color=red]red[color=blue]blue[/color]red again[/color]). This happens because their regex will look for the first closing tag it can find.
They couldn’t handle BBCode’s list format (e.g., [list][*]item 1[*]item 2[/list]).
They didn’t report errors on incorrect parent-child relationships (e.g., [list][td]item 1?[/td][/list]).
They weren’t easily extendible.

I naively thought it’d be easy to quickly whip up a parser, and at first it was. Most BBCode tags can be implemented with a simple find and replace. However, I quickly ran into the issues of dealing with nested tags of the same type, the noparse tag, and the list tag’s annoying [*] tag (which doesn’t have a closing tag). Luckily, I came across a neat blog post on finding nested patterns in JavaScript, which came in handy for isolating tag pairs, from the inner-most on up. Taking the idea from that post, one can do something like this to process the inner tags first and avoid the nested tag problem:

var str = "[list][list]test[/list][/list]",
    re = /\[([^\]]*?)\](.*?)\[\/\1\]/gi;
while (str !== (str = str.replace(re, function(strMatch, subM1, subM2) {
    return "" + subM2 + "";
})));
// str = "test"

That idea works well, though you can’t implement a noparse tag if you process the inner-most tags first. So I decided to pre-process the BBCode with something similar to the idea above and add in nested-depth information to each open and close tag. Once all of the tags had that, I could parse the processed code with a regex that could easily match-up the correct open and close tags.

To get around the issue of the [*] tag having no closing tag, I wrote code that inserted [/*] tags where they were supposed to go during the pre-processing period. I wont go into the algorithm here, but you can dig into the code if you’re interested.

Also, I should note that the fact that JavaScript allows you to use a function as the second parameter to the replace method makes processing the tags really easy. Once you match a set of tags, you can recursively call the parse function on that tag’s contents from inside of the function you passed to replace.

Using the parser

To use the use the parser, you’d simply include xbbcode.js and xbbcode.css files somewhere on your page (which are contained in the zip file linked above), and then call the XBBCODE object from somewhere in your JavaScript:

var result = XBBCODE.process({
    text: "Some bbcode to process here",
    removeMisalignedTags: false,
    addInLineBreaks: false
});
console.log("Errors: " + result.error);
console.dir(result.errorQueue);
console.log(result.html);// the HTML form of your BBCode

Adding new tags

To add a new tag to your BBCode, add properties to the “tags” object inside of the XBBCODE object. For example, say you wanted to add a tag called [googleit] which would change its contents into a link of its google search results. You’d implement that by adding this to the tags object:

"googleit": {
    openTag: function(params,content) {
        var website = "\"http://www.google.com/#q=" + content + '"';
        return '<a href=' + website + '>';
    },
    closeTag: function(params,content) {
        return '</a>';
    }
}

Then you could have BBCode like this: “[googleit]ta-da![/googleit]” which would be transformed into this: “<a href=”http://www.google.com/#q=ta-da!”>ta-da!</a>”

If you have any suggestions or find any bugs let me know.

45 thoughts on “Extendible BBCode Parser in JavaScript”

jelbourn says:

May 17, 2011 at 4:52 pm

This is quite the excellent piece of code you have written. Kudos.
patorjk says:

May 17, 2011 at 5:51 pm

Thanks 🙂
balping says:

January 6, 2012 at 2:04 pm

Hi!

I tried your BBCode Previewer. Please change this (in the end):

ret.html = ret.html.replace(“[”, “[“); // put [‘s back in
ret.html = ret.html.replace(“]”, “]”); // put [‘s back in

to this:

ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put [‘s back in
ret.html = ret.html.replace(/\&\#93\;/gm, “]”); // put [‘s back in

Your original version sometimes causes errors.

Some bbcodes (for example [b] and [u]) don’t have to be the child of another tag pair. Browsers can interpret this HTML code: “Hello World!”
patorjk says:

January 6, 2012 at 2:45 pm

Hi balping,

Thank you for the feedback! I did a little poking around, but I’m not sure how the first part sometimes leads to errors. Can you give me an example?

As for browsers parsing misaligned tags, I decided it was best to just restrict users to using the correct syntax. HTML parsers will actually try to guess what you meant if you have misaligned tags like the example you presented. You can try it out by running this code in Firebug (or in its own webpage):

var test = document.createElement(“div”);
test.innerHTML = “Hello World!”;
console.dir( test );

If you look at the childNodes property, then look at the childNodes property of the “b” tag, you’ll see it has a “u” element inside of it. The HTML parser basically interpreted that string like this:

Hello World!

On the surface this seems like it might be easily to implement, but it can become pretty nasty pretty quickly.
balping says:

January 6, 2012 at 3:01 pm

I interpret text in two steps:

1. I have base64 encoded data between [todecode][/todecode] tags. So in the first step I decode the data and I do nothing what outside these tags is.
2. I interpret the hole decoded text.

I don’t exactly know why, but if I use
ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put [‘s back in
ret.html = ret.html.replace(/\&\#93\;/gm, “]“); // put [‘s back in
the code works, if I don’t, it creates “[todecode<” tags.
Your original code replaces only the first match but my one replaces all of them.
balping says:

January 6, 2012 at 3:03 pm

Sorry, it crates “[todecode<]” tags…
balping says:

January 6, 2012 at 3:03 pm

Ahh it crates “[todecode>]” tags
Art says:

February 7, 2012 at 4:28 pm

This is a useful tool… but why do you set “me = this” at the very beginning? That essentially sets “me = window”, which is a WHOLE LOT of useless overhead in your case as all you want to return is an object with the process() method.

Just a thought…
patorjk says:

February 7, 2012 at 4:35 pm

Art – I just reviewed the code and you’re right, that is kind of useless. What I meant to do was write:

me = {};

That way the me object would be enclosed in the anonymous function yet still have access to the variables defined within. I think I changed from using the module pattern mid-way through development and forgot to update the me variable. Tonight I’ll update this, and then test and review to make sure everything’s in order. Thank you for letting me know!
Mitchell Simoens says:

February 7, 2012 at 4:40 pm

You may want to test the [PHP] codes. I tried to add it to the tags object but it didn’t seem the replace in the parse function picks it up.
patorjk says:

February 7, 2012 at 4:51 pm

Mitchell – How does the PHP tag behave? I looked it up real quick and read that it was similar to the code tag, so I added it to the previewer and it seemed to work ok:

http://patorjk.com/bbcode-previewer/

Edit: As a site note, right now it’s use is just set to be [php]php code here[/php]
Mitchell Simoens says:

February 7, 2012 at 5:04 pm

Should be identical to code. Only difference is some systems will do syntax highlighting but that doesn’t matter. If I have simple stuff between the php tags it works but if I have a lot of text in there I get misaligned tags warning from the previewer. This string I have has many [] (valid for JavaScript) in the string and tags and stuff. What would you expect the output of this:

[CODE][][/CODE]

That throws an error
patorjk says:

February 7, 2012 at 5:08 pm

Try:

[code][][/code]

It looks like there’s a bug when using upper case letters for tag names. I’m going to see if I can figure out what’s happening real quick.
Mitchell Simoens says:

February 7, 2012 at 5:13 pm

that would be correct… I did modify it to lower case the tagName.

I still need to see why the long code in [PHP] is breaking it
patorjk says:

February 7, 2012 at 5:16 pm

I just updated http://patorjk.com/bbcode-previewer/ and I think the upper-case issue is fixed. I updated 2 lines:

409: childTag = (matchingTags[ii].match(reTagNamesParts))[2].toLowerCase();
461: tagName = tagName.toLowerCase(); // line added

If you want to send me the sample PHP code I could help you trouble shoot the long code issue.
Mitchell Simoens says:

February 7, 2012 at 5:18 pm

That would be great! Where would you like the code? it’s quite long for a comment IMO
patorjk says:

February 7, 2012 at 5:19 pm

Send it to patorjk at gmail dot com.
mrfishie says:

June 12, 2012 at 3:16 am

Now that… that is awesome. Really helps, thanks! I have been looking for one of these in a while, and this is definitely what I need.
patorjk says:

June 14, 2012 at 9:39 pm

mrfishie – Glad it could help :).
steve says:

July 4, 2012 at 2:11 pm

It would be great if you could put this on GitHub. That way people could patch their own branches and send you pull requests.
1. patorjk says:
  
  July 5, 2012 at 1:41 pm
  
  Hi Steve,
  
  I have an old copy of it here:
  
  https://github.com/patorjk/Extendible-BBCode-Parser
  
  However, I haven’t updated it recently, and I’ve been procrastinating on learning the in’s and out’s the git (every time I use it I have to do a quick tutorial on it). I’ll shoot for updating it tonight though.
Bernard says:

December 12, 2012 at 10:19 am

Hi Patorjk

first of all thank you for the code.
I had tried using
[img]img/something.png[/img]
but it always returning

please advise 🙂
1. patorjk says:
  
  December 13, 2012 at 5:55 pm
  
  Can you give me a fuller test case? I just tried: (at http://patorjk.com/bbcode-previewer/)
  
  [img]http://web.scott.k12.va.us/martha2/dmbtest.gif[/img]
  
  And it looks like it works fine.
Daniel says:

January 20, 2013 at 1:13 am

Thank you! Really like the script.

Have you noticed the double line spacing inside of [code] when addInLineBreaks equals ‘true’? Perhaps I’m not thinking about it correctly, but I’d like to see anything inside the [code] block treated at pre-formatted, and not have tags inserted.
1. patorjk says:
  
  January 20, 2013 at 1:33 pm
  
  Hi Daniel,
  
  I agree, it shouldn’t add the formatting in for the code tag in that case. I’ll take a look at it later today.
  1. patorjk says:
    
    April 22, 2013 at 12:26 am
    
    Updated code to resolve issue.
    1. Dustin Perolio says:
      
      July 25, 2013 at 5:12 am
      
      Referring to Bernard’s post, I believe he’s saying if you leave off the http://, the result will be blank. For instance, [img]web.scott.k12.va.us/martha2/dmbtest.gif[/img] or
      [img]www.web.scott.k12.va.us/martha2/dmbtest.gif[/img]
Atul Vaidya says:

August 19, 2013 at 9:07 am

Thank you for making this code available, Patrick. I imagine there is a good reason why you have used styled spans in your HTML conversions as opposed to using tags such as etc. However, it is worth nothing that this can have a negative impact on SEO if the converted HTML is to be seen by Google & co.

For some reason you appear to have missed out on providing handlers for a few common BBCodes

[font=fontName]text[font]

"font":{openTag:function(params,content) { params = params.substr(1,params.length - 1); params = '"' + params + '"'; return ""; }, closeTag:function(params,content){return '';}, displayContent:true},

[sub]text[/sub] and [sup]text[/sup]
}, "sub": { openTag: function(params,content) { return ''; }, closeTag: function(params,content) { return ''; } }, "sup": { openTag: function(params,content) { return ''; }, closeTag: function(params,content) { return ''; } }

Finally, here is a slight mod to allow for the creation of anchors that can have target=’_blank’ and/or rel=’nofollow’. param should bear the form url|blank|nofollow where blank and nofollow can be 0 (false) or 1 (true).

"url": { openTag: function(params,content) {
params = params.substring(1).split('|'); var blank = (1 == parseInt(params[1]))?" target='_blank'":""; var nofollow = (1 == parseInt(params[2]))?" rel='nofollow'":''; urlPattern.lastIndex = 0; return ''; }, closeTag: function(params,content) { return ''; } },
1. patorjk says:
 
 August 20, 2013 at 11:02 pm
 
 Thanks for the suggestions! I’ve gone ahead and added sup and sub to the github repo. Your font tag implementation could possibly have a security hole in it since you’re trusting that the user isn’t passing along any nefarious parameters (ex: adding an onclick field – not sure if that’d work for the font tag, but you have to be careful).
 
 Spans were used for some of the tags since you can’t be sure if someone hasn’t overridden the styling for tags like code (some CSS frameworks do this).
Atul Vaidya says:

August 21, 2013 at 9:43 am

I am using your code along with my own take on BBCode. Amongst other things I am also allowing for text alignment in paragraphs. Here is the code for that

"para": { openTag: function(params,content) { params = parseInt(params.reverse()); var al = 'left'; switch (params) { case 1:al = 'right';break; case 2:al = 'center';break; case 3:al='justify';break; } return ''; }, closeTag: function(params,content) { return '</>'; } },
Deanne says:

September 25, 2013 at 4:17 am

Hey, I think your website might be having browser compatibility issues.
When I look at your blog in Firefox, it looks fine but when opening in Internet Explorer, it has
some overlapping. I just wwnted to give you a quick hears
up! Other then that, very good blog!
Christopher says:

February 13, 2014 at 6:23 pm

First of all: Thanks for your nice job, it works quite well!

FYI: I needed to update the urlPattern to support semicolons as well. Have there been any reasons why it’s not supported in the current version?

urlPattern = /^(?:https?|file|c):(?:\/{1,3}|\\{1})[-a-zA-Z0-9:;@#%&()~_?\+=\/\\\.]*$/,
1. patorjk says:
  
  February 17, 2014 at 12:20 pm
  
  Thanks for the updated pattern! If you want you can send me a pull request via github, otherwise I’ll add it in later this month.
  1. Christopher says:
    
    February 17, 2014 at 1:40 pm
    
    Hi Patorjk, I just did the pull request.
    1. patorjk says:
      
      February 17, 2014 at 9:47 pm
      
      Awesome, thank you! Just merged it in.
KN4CK3R says:

March 17, 2014 at 3:02 am

Is there a way to get rid of the unneeded line breaks (inside lists or tables)?
[table][tr][td]1[/td] [td]2[/td] [/tr] [tr][td]3[/td] [td]4[/td] [/tr] [/table]

This will output as
\n\n\n\n\n\ntable...
1. KN4CK3R says:
  
  March 19, 2014 at 3:13 pm
  
  or the “Nested List Tags” example
2. patorjk says:
  
  March 21, 2014 at 11:09 am
  
  You’d have to take a look at the source and make some adjustments:
  
  https://github.com/patorjk/Extendible-BBCode-Parser
  
  If you want to make the changes and submit a pull request, I’ll add it in.
  1. KN4CK3R says:
    
    March 21, 2014 at 1:03 pm
    
    Yes, I know I can edit the code, thought someone has done this already. I haven’t found a place where to do the triming.
  2. Knight Yoshi says:
    
    October 13, 2014 at 12:29 pm
    
    Do you know of a good way to handle smilies? In the instance where I’m using this, there are an absurd amount of smilies, 151. Obviously creating a tag for every single one isn’t the best option. Any idea?
Philip N says:

April 2, 2014 at 1:53 pm

Wery nice, thank you for this. If I can get it working with AJAX Chat ( https://github.com/Frug/AJAX-Chat/ ) a lot of people will enjoy nesting their colors and making rainbows.
Philip N says:

April 4, 2014 at 5:18 pm

I was able to get it working, but I wanted to allow the possibility of html tags, since html escaping for my script is done server-side.
Removing your initial escaping of html tags caused issues because of the parser’s logic:
change all []’s to angle brackets temporarily
find valid tags and return them to []’s
change all leftover angle brackets to escape chars

I struggled with this for a long time but put together a regex that skips these steps by escaping all unsupported bbcodes:
this.nonTags = new RegExp("(\\[)((?!/?" + this.tagList.join("(?:\\]|\\b|=)|/?") + "(?:\\]|\\b|=)).*?)(\\])");
Then I can replace lines 720 to 730 with this:
// Capture all bbcode tags that are not part of the tag list, and escape them config.text = config.text.replace(nonTags, function(matchStr, openB, contents, closeB) { return '[' + contents + ']'; });
Works for me so far.
1. Philip N says:
  
  April 4, 2014 at 5:25 pm
  
  Er, that’s hard to read. I have it up on github.
  https://github.com/Frug/Extendible-BBCode-Parser
Knight Yoshi says:

October 13, 2014 at 1:58 am

I pretty much love this! It’s so easy to use and to add to and modify!
Though I’m curious for the color names, why you’re matching specific names, instead of just a whole word \w+
Also if you want to add matching for RGB and RGBA
colorRgbPattern = /(^rgb$(\d+),\s*(\d+),\s*(\d+)$$)|(^rgba$(\d+),\s*(\d+),\s*(\d+)(,\s*\d+\.\d+)*$$)/
Then just update the if-statement.
1. patorjk says:
  
  October 13, 2014 at 8:15 pm
  
  It’s been a long time since I’ve looked at the code. I think I was just trying to model it after some BBCode implementations I had seen on message boards.
  
  If you’ve got something working for an rgb/rgba setup, you can send me a pull request on github and I’ll check it out.

Comments are closed.