I decided to try my hand at implementing a BBCode parser in JavaScript. You can play around with it online here, and download the source here.
I had looked around a little bit and noticed that the existing JavaScript BBCode parsers had at least a few of the following issues:
- They didn’t report errors on misaligned tags (e.g., [b][u]test[/b][/u]).
- They couldn’t handle tags of the same type that were nested within each other (e.g., [color=red]red[color=blue]blue[/color]red again[/color]). This happens because their regex will look for the first closing tag it can find.
- They couldn’t handle BBCode’s list format (e.g., [list][*]item 1[*]item 2[/list]).
- They didn’t report errors on incorrect parent-child relationships (e.g., [list][td]item 1?[/td][/list]).
- They weren’t easily extendible.
I naively thought it’d be easy to quickly whip up a parser, and at first it was. Most BBCode tags can be implemented with a simple find and replace. However, I quickly ran into the issues of dealing with nested tags of the same type, the noparse tag, and the list tag’s annoying [*] tag (which doesn’t have a closing tag). Luckily, I came across a neat blog post on finding nested patterns in JavaScript, which came in handy for isolating tag pairs, from the inner-most on up. Taking the idea from that post, one can do something like this to process the inner tags first and avoid the nested tag problem:
var str = "[list][list]test[/list][/list]", re = /\[([^\]]*?)\](.*?)\[\/\1\]/gi; while (str !== (str = str.replace(re, function(strMatch, subM1, subM2) { return "" + subM2 + " "; }))); // str = "" test
That idea works well, though you can’t implement a noparse tag if you process the inner-most tags first. So I decided to pre-process the BBCode with something similar to the idea above and add in nested-depth information to each open and close tag. Once all of the tags had that, I could parse the processed code with a regex that could easily match-up the correct open and close tags.
To get around the issue of the [*] tag having no closing tag, I wrote code that inserted [/*] tags where they were supposed to go during the pre-processing period. I wont go into the algorithm here, but you can dig into the code if you’re interested.
Also, I should note that the fact that JavaScript allows you to use a function as the second parameter to the replace method makes processing the tags really easy. Once you match a set of tags, you can recursively call the parse function on that tag’s contents from inside of the function you passed to replace.
Using the parser
To use the use the parser, you’d simply include xbbcode.js and xbbcode.css files somewhere on your page (which are contained in the zip file linked above), and then call the XBBCODE object from somewhere in your JavaScript:
var result = XBBCODE.process({ text: "Some bbcode to process here", removeMisalignedTags: false, addInLineBreaks: false }); console.log("Errors: " + result.error); console.dir(result.errorQueue); console.log(result.html);// the HTML form of your BBCode
Adding new tags
To add a new tag to your BBCode, add properties to the “tags” object inside of the XBBCODE object. For example, say you wanted to add a tag called [googleit] which would change its contents into a link of its google search results. You’d implement that by adding this to the tags object:
"googleit": { openTag: function(params,content) { var website = "\"http://www.google.com/#q=" + content + '"'; return '<a href=' + website + '>'; }, closeTag: function(params,content) { return '</a>'; } }
Then you could have BBCode like this: “[googleit]ta-da![/googleit]” which would be transformed into this: “<a href=”http://www.google.com/#q=ta-da!”>ta-da!</a>”
If you have any suggestions or find any bugs let me know.
This is quite the excellent piece of code you have written. Kudos.
Thanks 🙂
Hi!
I tried your BBCode Previewer. Please change this (in the end):
ret.html = ret.html.replace(“[”, “[“); // put [‘s back in
ret.html = ret.html.replace(“]”, “]”); // put [‘s back in
to this:
ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put [‘s back in
ret.html = ret.html.replace(/\&\#93\;/gm, “]”); // put [‘s back in
Your original version sometimes causes errors.
Some bbcodes (for example [b] and [u]) don’t have to be the child of another tag pair. Browsers can interpret this HTML code: “He<b>ll<u>o Wor</b>ld!</u>”
Hi balping,
Thank you for the feedback! I did a little poking around, but I’m not sure how the first part sometimes leads to errors. Can you give me an example?
As for browsers parsing misaligned tags, I decided it was best to just restrict users to using the correct syntax. HTML parsers will actually try to guess what you meant if you have misaligned tags like the example you presented. You can try it out by running this code in Firebug (or in its own webpage):
var test = document.createElement(“div”);
test.innerHTML = “He<b>ll<u>o Wor</b>ld!</u>”;
console.dir( test );
If you look at the childNodes property, then look at the childNodes property of the “b” tag, you’ll see it has a “u” element inside of it. The HTML parser basically interpreted that string like this:
He<b>ll<u>o Wor</u></b><u>ld!</u>
On the surface this seems like it might be easily to implement, but it can become pretty nasty pretty quickly.
I interpret text in two steps:
1. I have base64 encoded data between [todecode][/todecode] tags. So in the first step I decode the data and I do nothing what outside these tags is.
2. I interpret the hole decoded text.
I don’t exactly know why, but if I use
ret.html = ret.html.replace(/\&\#91\;/gm, “[“); // put [‘s back in
ret.html = ret.html.replace(/\&\#93\;/gm, “]“); // put [‘s back in
the code works, if I don’t, it creates “[todecode<” tags.
Your original code replaces only the first match but my one replaces all of them.
Sorry, it crates “[todecode<]” tags…
Ahh it crates “[todecode>]” tags
This is a useful tool… but why do you set “me = this” at the very beginning? That essentially sets “me = window”, which is a WHOLE LOT of useless overhead in your case as all you want to return is an object with the process() method.
Just a thought…
Art – I just reviewed the code and you’re right, that is kind of useless. What I meant to do was write:
me = {};
That way the me object would be enclosed in the anonymous function yet still have access to the variables defined within. I think I changed from using the module pattern mid-way through development and forgot to update the me variable. Tonight I’ll update this, and then test and review to make sure everything’s in order. Thank you for letting me know!
You may want to test the [PHP] codes. I tried to add it to the tags object but it didn’t seem the replace in the parse function picks it up.
Mitchell – How does the PHP tag behave? I looked it up real quick and read that it was similar to the code tag, so I added it to the previewer and it seemed to work ok:
http://patorjk.com/bbcode-previewer/
Edit: As a site note, right now it’s use is just set to be [php]php code here[/php]
Should be identical to code. Only difference is some systems will do syntax highlighting but that doesn’t matter. If I have simple stuff between the php tags it works but if I have a lot of text in there I get misaligned tags warning from the previewer. This string I have has many [] (valid for JavaScript) in the string and tags and stuff. What would you expect the output of this:
[CODE][][/CODE]
That throws an error
Try:
[code][][/code]
It looks like there’s a bug when using upper case letters for tag names. I’m going to see if I can figure out what’s happening real quick.
that would be correct… I did modify it to lower case the tagName.
I still need to see why the long code in [PHP] is breaking it
I just updated http://patorjk.com/bbcode-previewer/ and I think the upper-case issue is fixed. I updated 2 lines:
409: childTag = (matchingTags[ii].match(reTagNamesParts))[2].toLowerCase();
461: tagName = tagName.toLowerCase(); // line added
If you want to send me the sample PHP code I could help you trouble shoot the long code issue.
That would be great! Where would you like the code? it’s quite long for a comment IMO
Send it to patorjk at gmail dot com.
Now that… that is awesome. Really helps, thanks! I have been looking for one of these in a while, and this is definitely what I need.
mrfishie – Glad it could help :).
It would be great if you could put this on GitHub. That way people could patch their own branches and send you pull requests.
Hi Steve,
I have an old copy of it here:
https://github.com/patorjk/Extendible-BBCode-Parser
However, I haven’t updated it recently, and I’ve been procrastinating on learning the in’s and out’s the git (every time I use it I have to do a quick tutorial on it). I’ll shoot for updating it tonight though.
Hi Patorjk
first of all thank you for the code.
I had tried using
[img]img/something.png[/img]
but it always returning
please advise 🙂
Can you give me a fuller test case? I just tried: (at http://patorjk.com/bbcode-previewer/)
[img]http://web.scott.k12.va.us/martha2/dmbtest.gif[/img]
And it looks like it works fine.
Thank you! Really like the script.
Have you noticed the double line spacing inside of [code] when addInLineBreaks equals ‘true’? Perhaps I’m not thinking about it correctly, but I’d like to see anything inside the [code] block treated at pre-formatted, and not have tags inserted.
Hi Daniel,
I agree, it shouldn’t add the formatting in for the code tag in that case. I’ll take a look at it later today.
Updated code to resolve issue.
Referring to Bernard’s post, I believe he’s saying if you leave off the http://, the result will be blank. For instance, [img]web.scott.k12.va.us/martha2/dmbtest.gif[/img] or
[img]www.web.scott.k12.va.us/martha2/dmbtest.gif[/img]
Thank you for making this code available, Patrick. I imagine there is a good reason why you have used styled spans in your HTML conversions as opposed to using tags such as <b> etc. However, it is worth nothing that this can have a negative impact on SEO if the converted HTML is to be seen by Google & co.
For some reason you appear to have missed out on providing handlers for a few common BBCodes
[font=fontName]text[font]
"font":{openTag:function(params,content)
{
params = params.substr(1,params.length - 1);
params = '"' + params + '"';
return "";
},
closeTag:function(params,content){return '';},
displayContent:true},
[sub]text[/sub] and [sup]text[/sup]
},
"sub": {
openTag: function(params,content) {
return '';
},
closeTag: function(params,content) {
return '';
}
},
"sup": {
openTag: function(params,content) {
return '';
},
closeTag: function(params,content) {
return '';
}
}
Finally, here is a slight mod to allow for the creation of anchors that can have target=’_blank’ and/or rel=’nofollow’. param should bear the form url|blank|nofollow where blank and nofollow can be 0 (false) or 1 (true).
"url": {
openTag: function(params,content) {
params = params.substring(1).split('|');
var blank = (1 == parseInt(params[1]))?" target='_blank'":"";
var nofollow = (1 == parseInt(params[2]))?" rel='nofollow'":'';
urlPattern.lastIndex = 0;
return '';
},
closeTag: function(params,content) {
return '';
}
},
Thanks for the suggestions! I’ve gone ahead and added sup and sub to the github repo. Your font tag implementation could possibly have a security hole in it since you’re trusting that the user isn’t passing along any nefarious parameters (ex: adding an onclick field – not sure if that’d work for the font tag, but you have to be careful).
Spans were used for some of the tags since you can’t be sure if someone hasn’t overridden the styling for tags like code (some CSS frameworks do this).
I am using your code along with my own take on BBCode. Amongst other things I am also allowing for text alignment in paragraphs. Here is the code for that
"para": {
openTag: function(params,content)
{
params = parseInt(params.reverse());
var al = 'left';
switch (params)
{
case 1:al = 'right';break;
case 2:al = 'center';break;
case 3:al='justify';break;
}
return '<p style="text-align:' + al + '"/>';
},
closeTag: function(params,content) {
return '</>';
}
},
Hey, I think your website might be having browser compatibility issues.
When I look at your blog in Firefox, it looks fine but when opening in Internet Explorer, it has
some overlapping. I just wwnted to give you a quick hears
up! Other then that, very good blog!
First of all: Thanks for your nice job, it works quite well!
FYI: I needed to update the urlPattern to support semicolons as well. Have there been any reasons why it’s not supported in the current version?
urlPattern = /^(?:https?|file|c):(?:\/{1,3}|\\{1})[-a-zA-Z0-9:;@#%&()~_?\+=\/\\\.]*$/,
Thanks for the updated pattern! If you want you can send me a pull request via github, otherwise I’ll add it in later this month.
Hi Patorjk, I just did the pull request.
Awesome, thank you! Just merged it in.
Is there a way to get rid of the unneeded line breaks (inside lists or tables)?
[table][tr][td]1[/td]
[td]2[/td]
[/tr]
[tr][td]3[/td]
[td]4[/td]
[/tr]
[/table]
This will output as
\n\n\n\n\n\ntable...
or the “Nested List Tags” example
You’d have to take a look at the source and make some adjustments:
https://github.com/patorjk/Extendible-BBCode-Parser
If you want to make the changes and submit a pull request, I’ll add it in.
Yes, I know I can edit the code, thought someone has done this already. I haven’t found a place where to do the triming.
Do you know of a good way to handle smilies? In the instance where I’m using this, there are an absurd amount of smilies, 151. Obviously creating a tag for every single one isn’t the best option. Any idea?
Wery nice, thank you for this. If I can get it working with AJAX Chat ( https://github.com/Frug/AJAX-Chat/ ) a lot of people will enjoy nesting their colors and making rainbows.
I was able to get it working, but I wanted to allow the possibility of html tags, since html escaping for my script is done server-side.
Removing your initial escaping of html tags caused issues because of the parser’s logic:
change all []’s to angle brackets temporarily
find valid tags and return them to []’s
change all leftover angle brackets to escape chars
I struggled with this for a long time but put together a regex that skips these steps by escaping all unsupported bbcodes:
this.nonTags = new RegExp("(\\[)((?!/?" + this.tagList.join("(?:\\]|\\b|=)|/?") + "(?:\\]|\\b|=)).*?)(\\])");
Then I can replace lines 720 to 730 with this:
// Capture all bbcode tags that are not part of the tag list, and escape them
config.text = config.text.replace(nonTags, function(matchStr, openB, contents, closeB) {
return '[' + contents + ']';
});
Works for me so far.
Er, that’s hard to read. I have it up on github.
https://github.com/Frug/Extendible-BBCode-Parser
I pretty much love this! It’s so easy to use and to add to and modify!
Though I’m curious for the color names, why you’re matching specific names, instead of just a whole word \w+
Also if you want to add matching for RGB and RGBA
colorRgbPattern = /(^rgb\((\d+),\s*(\d+),\s*(\d+)\)$)|(^rgba\((\d+),\s*(\d+),\s*(\d+)(,\s*\d+\.\d+)*\)$)/
Then just update the if-statement.
It’s been a long time since I’ve looked at the code. I think I was just trying to model it after some BBCode implementations I had seen on message boards.
If you’ve got something working for an rgb/rgba setup, you can send me a pull request on github and I’ll check it out.