Photo By Dean Terry
I decided to try my hand at implementing a BBCode parser in JavaScript. You can play around with it online here, and download the source here.
I had looked around a little bit and noticed that the existing JavaScript BBCode parsers had at least a few of the following issues:
- They didn’t report errors on misaligned tags (e.g., [b][u]test[/b][/u]).
- They couldn’t handle tags of the same type that were nested within each other (e.g., [color=red]red[color=blue]blue[/color]red again[/color]). This happens because their regex will look for the first closing tag it can find.
- They couldn’t handle BBCode’s list format (e.g., [list][*]item 1[*]item 2[/list]).
- They didn’t report errors on incorrect parent-child relationships (e.g., [list][td]item 1?[/td][/list]).
- They weren’t easily extendible.
I naively thought it’d be easy to quickly whip up a parser, and at first it was. Most BBCode tags can be implemented with a simple find and replace. However, I quickly ran into the issues of dealing with nested tags of the same type, the noparse tag, and the list tag’s annoying [*] tag (which doesn’t have a closing tag). Luckily, I came across a neat blog post on finding nested patterns in JavaScript, which came in handy for isolating tag pairs, from the inner-most on up. Taking the idea from that post, one can do something like this to process the inner tags first and avoid the nested tag problem:
var str = "[list][list]test[/list][/list]",
re = /\[([^\]]*?)\](.*?)\[\/\1\]/gi;
while (str !== (str = str.replace(re, function(strMatch, subM1, subM2) {
return "" + subM2 + " ";
})));
// str = "test "
That idea works well, though you can’t implement a noparse tag if you process the inner-most tags first. So I decided to pre-process the BBCode with something similar to the idea above and add in nested-depth information to each open and close tag. Once all of the tags had that, I could parse the processed code with a regex that could easily match-up the correct open and close tags.
To get around the issue of the [*] tag having no closing tag, I wrote code that inserted [/*] tags where they were supposed to go during the pre-processing period. I wont go into the algorithm here, but you can dig into the code if you’re interested.
Also, I should note that the fact that JavaScript allows you to use a function as the second parameter to the replace method makes processing the tags really easy. Once you match a set of tags, you can recursively call the parse function on that tag’s contents from inside of the function you passed to replace.
Using the parser
To use the use the parser, you’d simply include xbbcode.js and xbbcode.css files somewhere on your page (which are contained in the zip file linked above), and then call the XBBCODE object from somewhere in your JavaScript:
var result = XBBCODE.process({
text: "Some bbcode to process here",
removeMisalignedTags: false,
addInLineBreaks: false
});
console.log("Errors: " + result.error);
console.dir(result.errorQueue);
console.log(result.html);// the HTML form of your BBCode
Adding new tags
To add a new tag to your BBCode, add properties to the “tags” object inside of the XBBCODE object. For example, say you wanted to add a tag called [googleit] which would change its contents into a link of its google search results. You’d implement that by adding this to the tags object:
"googleit": {
openTag: function(params,content) {
var website = "\"http://www.google.com/#q=" + content + '"';
return '<a href=' + website + '>';
},
closeTag: function(params,content) {
return '</a>';
}
}
Then you could have BBCode like this: “[googleit]ta-da![/googleit]” which would be transformed into this: “<a href=”http://www.google.com/#q=ta-da!”>ta-da!</a>”
If you have any suggestions or find any bugs let me know.
This is quite the excellent piece of code you have written. Kudos.
Thanks
Hi!
I tried your BBCode Previewer. Please change this (in the end):
ret.html = ret.html.replace(“[”, “["); // put ['s back in
ret.html = ret.html.replace("]", "]“); // put ['s back in
to this:
ret.html = ret.html.replace(/\&\#91\;/gm, "["); // put ['s back in
ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put ['s back in
Your original version sometimes causes errors.
Some bbcodes (for example [b] and [u]) don’t have to be the child of another tag pair. Browsers can interpret this HTML code: “He<b>ll<u>o Wor</b>ld!</u>”
Hi balping,
Thank you for the feedback! I did a little poking around, but I’m not sure how the first part sometimes leads to errors. Can you give me an example?
As for browsers parsing misaligned tags, I decided it was best to just restrict users to using the correct syntax. HTML parsers will actually try to guess what you meant if you have misaligned tags like the example you presented. You can try it out by running this code in Firebug (or in its own webpage):
var test = document.createElement(“div”);
test.innerHTML = “He<b>ll<u>o Wor</b>ld!</u>”;
console.dir( test );
If you look at the childNodes property, then look at the childNodes property of the “b” tag, you’ll see it has a “u” element inside of it. The HTML parser basically interpreted that string like this:
He<b>ll<u>o Wor</u></b><u>ld!</u>
On the surface this seems like it might be easily to implement, but it can become pretty nasty pretty quickly.
I interpret text in two steps:
1. I have base64 encoded data between [todecode][/todecode] tags. So in the first step I decode the data and I do nothing what outside these tags is.
2. I interpret the hole decoded text.
I don’t exactly know why, but if I use
ret.html = ret.html.replace(/\&\#91\;/gm, “["); // put ['s back in
ret.html = ret.html.replace(/\&\#93\;/gm, "]“); // put [‘s back in
the code works, if I don’t, it creates “[todecode<” tags.
Your original code replaces only the first match but my one replaces all of them.
Sorry, it crates “[todecode<]” tags…
Ahh it crates “[todecode>]” tags
This is a useful tool… but why do you set “me = this” at the very beginning? That essentially sets “me = window”, which is a WHOLE LOT of useless overhead in your case as all you want to return is an object with the process() method.
Just a thought…
Art – I just reviewed the code and you’re right, that is kind of useless. What I meant to do was write:
me = {};
That way the me object would be enclosed in the anonymous function yet still have access to the variables defined within. I think I changed from using the module pattern mid-way through development and forgot to update the me variable. Tonight I’ll update this, and then test and review to make sure everything’s in order. Thank you for letting me know!
You may want to test the [PHP] codes. I tried to add it to the tags object but it didn’t seem the replace in the parse function picks it up.
Mitchell – How does the PHP tag behave? I looked it up real quick and read that it was similar to the code tag, so I added it to the previewer and it seemed to work ok:
http://patorjk.com/bbcode-previewer/
Edit: As a site note, right now it’s use is just set to be [php]php code here[/php]
Should be identical to code. Only difference is some systems will do syntax highlighting but that doesn’t matter. If I have simple stuff between the php tags it works but if I have a lot of text in there I get misaligned tags warning from the previewer. This string I have has many [] (valid for JavaScript) in the string and tags and stuff. What would you expect the output of this:
[CODE][][/CODE]
That throws an error
Try:
[code][][/code]
It looks like there’s a bug when using upper case letters for tag names. I’m going to see if I can figure out what’s happening real quick.
that would be correct… I did modify it to lower case the tagName.
I still need to see why the long code in [PHP] is breaking it
I just updated http://patorjk.com/bbcode-previewer/ and I think the upper-case issue is fixed. I updated 2 lines:
409: childTag = (matchingTags[ii].match(reTagNamesParts))[2].toLowerCase();
461: tagName = tagName.toLowerCase(); // line added
If you want to send me the sample PHP code I could help you trouble shoot the long code issue.
That would be great! Where would you like the code? it’s quite long for a comment IMO
Send it to patorjk at gmail dot com.
Now that… that is awesome. Really helps, thanks! I have been looking for one of these in a while, and this is definitely what I need.
mrfishie – Glad it could help
.
It would be great if you could put this on GitHub. That way people could patch their own branches and send you pull requests.
Hi Steve,
I have an old copy of it here:
https://github.com/patorjk/Extendible-BBCode-Parser
However, I haven’t updated it recently, and I’ve been procrastinating on learning the in’s and out’s the git (every time I use it I have to do a quick tutorial on it). I’ll shoot for updating it tonight though.
Hi Patorjk
first of all thank you for the code.
I had tried using
[img]img/something.png[/img]but it always returning
please advise
Can you give me a fuller test case? I just tried: (at http://patorjk.com/bbcode-previewer/)
[img]http://web.scott.k12.va.us/martha2/dmbtest.gif[/img]
And it looks like it works fine.
Thank you! Really like the script.
Have you noticed the double line spacing inside of [code] when addInLineBreaks equals 'true'? Perhaps I'm not thinking about it correctly, but I'd like to see anything inside the [code] block treated at pre-formatted, and not have tags inserted.
Hi Daniel,
I agree, it shouldn’t add the formatting in for the code tag in that case. I’ll take a look at it later today.
Updated code to resolve issue.