Category Archives: JavaScript

Trying Out Dojo For A Stylometry Web App

So I decided to try and build a full blown web app using the various tools Dojo provides. I wanted to try out the feed reading data store, the grid control, some of the form dijits (Dojo widgets), and the tab control. After some thought, I ended up deciding to write an application that would allow you to compare the writing styles of different blogs and see if they had the same author. This idea had been on my “backburner list” since around 2003 when I read an interesting article on the subject called “Bookish Math”.

The field of study that deals with examining artistic works to determine authorship is called Stylometry. The theory is that each of us has a unique and identifiable way of doing something. For writing, many of the newer techniques involve examining the how often we use common words. From “Bookish Math”:

“People’s unconscious use of everyday words comes out with a certain stamp,” says David Holmes, a stylometrist at the College of New Jersey in Ewing. Precisely because writers use these function words without thinking about them, they may offer more reliable fingerprints of a writer’s style than unusual words do.

“Rare words are noticeable words, which someone else might pick up or echo unconsciously,” Burrows says. “It’s much harder for someone to imitate my frequency pattern of ‘but’ and ‘in’.”

Not Really Written By Frank L. Baum

Written By Frank L. Baum?


The article goes on to talk about how frequency analysis of certain words in the “Federalist Papers” supported the idea that Madison wrote them instead of Hamilton, how an analysis on the 15th Wizard of OZ Book (billed as Frank L. Baum’s last book) revealed that it wasn’t really written by him, and how various other works can be clearly distinguished from an analysis of common words.

Since counting up common words is rather trivial, I decided to see if I could read in some blog feeds, find the frequencies of their common words, and then compare these frequencies to other blogs to see if I could determine authorship. Unfortunately, this rather naive approach didn’t come out as well as I hoped. After the app was tested, none of the numbers seemed to really stand out.

For blogs that should be similar (like this one and my livejournal), I found the common word frequencies to vary somewhat significantly. I only had overlap on around 10-20% of the words, and I wasn’t sure if that was a statistical coincidence. I also used one other person’s professional and personal blog and found similar results. I then tried to do a little original research and implemented the following alogrithm:

  • Find the frequencies of the 50 most common words in the blog’s first 1,200 words.
  • Find the frequencies of the 50 most common words in the whole document.
  • Compare the two lists and dub the words that have similar frequencies “pattern words” – words that the person seems to use with a consistent frequency.
  • Compare the “pattern words” in different blogs and see how well they overlap.

That worked a better, but I still couldn’t get completely accurate results. So the algorithm still needs a lot of work. Below you can see a small sampling of the frequency results from this blog vs my old livejournal. A frequency of 1% would mean that word makes up 1% of all of the words that were typed.

As for the Dojo side of things, I ended up really liking the slick look of the dijits. I also liked how I didn’t have to host any of the Dojo files myself, I could simply use the ones posted at the AOL Developer Network.

However, I wasn’t too happy that Dojo caused the page to take 3-4 seconds to load. And the odd sudden change from normal widgets into dijits in front of the user was kind of odd. I’m not sure if there’s a way to avoid that. This might be because I’m using a lot of Dojo tools and the Dojo library is 1.6MB gzipped. Not everything is downloaded, only what you use, but I ended up using quite a few of its tools.

Other issues I ran into were:

  • There’s a bug in the grid control that effects IE7 users. The grid text doesn’t appear in IE7 if the div containing the grid has anything other than “left” for the “text-align” style property.
  • You can’t create Dojo grids in divs that have their “display” property set to “none”. This bothered me because I originally wanted the grid containing the frequencies to “fade-in” after the user hit the “Process Data!” button.

Despite the short comings of my algorithm and some of app’s bloat, I decided to post it up anyway. You can view it here: Blog Stylometry Tool (note: I commented out the analysis side of things – so all it does is spit back a table of word frequencies)

I’ll most likely end up slimming down on the amount of Dojo that it uses to increase load time. Either that, or I’ll try and figure out a way to defer some of the load time. The majority of the loading time is coming from setting up the grid.

“Learning Dojo” Book Review

So I’ve finished reading my copy of the book Learning Dojo By Peter Svensson. It was a pretty easy read and contained a lot of good info. Overall it’s an okay book, though there are parts where it seems a little rough around the edges, however, I’ll get to why I think that in a bit.

So what is Dojo? Dojo is a free JavaScript toolkit similar to YUI and jQuery, though it bills itself as being larger in terms of features and functionality than the existing JavaScript toolkits. Visually, the widgets it provides (which Dojo refers to as Dijits) are pretty slick and easy to use. You can see a demo of many of them here (this link is unrelated to the book).

Dojo is for people who know JavaScript and who want to add more gagets to their development tool belt. Even though Dojo offers alternative ways to do certain things (like inheritance and finding elements by their ID), you can’t really use it unless you have a decent understanding of JavaScript. Well, you could, but you wouldn’t get very far.

Learning Dojo appears to be aimed at giving web developers a broad introduction on what Dojo can do. It starts off with a two chapters introducing Dojo and then follows them by having chapters dedicated to a particular area of interest – like Forms, Dijits (what Dojo calls widgets), Layouts, Ajax, etc. The author even provides the Layout chapter online as a preview for the book, it’s located here. That chapter gives a good sampling of what the author’s writing style is like.

Overall I thought the book was decent. It’s not something I’d rave about, but its a solid book and contained a lot of info and examples. I feel like it gave a nice introduction to what Dojo is and how it could be used in the real world. I found the chapter on “Data, Trees, and Grids” useful. I had been wondering about easy ways to do grids and to read RSS feeds (the book didn’t discuss RSS feed reading, but it mentioned a library that could do it), so I was pleasantly surprised to come across this info.

My only criticisms of the book were some formatting, editing and spelling issues I ran into. Around 5% of the code examples had indentation issues. The tabs would be aligned incorrectly in certain parts of the example. I’m guessing its because a mixture of space and tab characters were used. However, I don’t know, and these examples were still readable. This was mostly just an annoyance. Also, at the very end of Chapter Two, the “JSON” section had 3 small examples where the code was not printed correctly and it was unreadable. This was upsetting, though it was the only place where I encountered this.

Anyway, if you’re looking for a book on Dojo, Learning Dojo, while rough around the edges, does provide a decent introduction and contains some handy examples.

Also, like I mentioned in a previous post, I was sent this book for free by the publisher with the only condition being that I write a review of what I thought. I don’t get any money if you buy it.

And lastly, if you want to play around with Dojo, you can download it for free at their website.

The Dojo Toolkit… Hrm…

A few days after I posted my previous entry on the JavaScript books I was reading, I got an email from a book publishing company asking if I was interested in a free review copy of their latest JavaScript book. I was a little skeptical at first, however, after a brief chat, the only string attached was that I write blog entry on what I thought of the book. This seemed like a fair deal, so I decided to take them up on it.

Since I’m getting something for free, I figured the honest thing to do would be to open about it so people don’t think my opinions are biased, since I do tend to gush about stuff I like (YUI, Google Alerts, etc.). When I don’t like something, I usually just don’t write about it. However, if this book ends up sucking, I will be brutally honest about how much it sucks. Though based strictly on chapter 1, the book appears as if it will be a pretty decent read.

Anyway, the topic of the book is The Dojo Toolkit. Dojo is a free open source JavaScript library that provides a number of widgets and utilities, much like jQuery and YUI. Right now I’m unsure of how it compares to these other frameworks, but it looks very promising based on what I’ve seen so far from various Dojo websites. The charting package looked particularly interesting.

Hopefully I’ll learn a lot of cool stuff about Dojo. I wont pretend to be a JavaScript expert, so the review I’ll write will most likely be in the same style as my last review – though probably a bit more thorough. For those of you who are curious, the book is called “Learning Dojo” (that’s the book’s actual website, this is not a sponsored link and I don’t get anything if you buy the book – not that you shouldn’t buy it, I just want to be clear since that page is mostly about buying the book).

After I finish this new book I’m going to get back to my JavaScript Design Patterns book (which it’s really good so far – though I’d only recommend it to hardcore JavaScript developers). And after that I’m going to get back to writing stuff for this site. Hopefully all of this reading will pay off with some nicer apps, tutorials, and programming examples.