Part of how I was able to bring up Venice to replace CommunityWare so quickly was that I had been responsible for one of the key CommunityWare components, the HTML Checker, which performed several different operations related to conferencing:

  • It limited acceptable HTML tags in messages to a known-good "whitelist." This whitelist could be adjusted to a more restricted set, to allow HTML to be used in topic names and post "pseuds" (the header line for a post, usually the poster's name).
  • It automatically "balanced" tags at the end of a post, to ensure that sloppy HTML wouldn't corrupt the page.
  • It resolved references to posts and users, which could also be specified inside angle brackets (and users could be specified in parentheses). For example <Commons.3.14> referred to post 14 in topic 3 of the Commons conference. <Playground.7> referred to topic 7 in the Playground conference. <8.7-10> referred to posts 7 through 10 of topic 8 in the current conference. (This reference syntax dates back to The WELL's PicoSpan conferencing system, through WellEngaged, their commercial software product that the original Electric Minds used.)
  • It also detected and highlighted URLs and E-mail addresses, turning them into hyperlinks.
  • During preview, it could also act as a spellchecker, highlighting misspelled words in red. CommunityWare needed this functionality to provide "full fidelity" emulation of the WellEngaged style of conferencing, in a completely different environment. (CommunityWare used ASP, VBScript, and SQL Server.). That job fell to me, since everyone else was hacking on other code in a hurry, and I implemented the original HTML Checker as an ASP component, using C++, Active Template Library (ATL), and the standard template library (STL). The exact implementation was probably done with Visual C++. I found an inexpensive spelling checker library to use as the dictionary for the spellchecking.

The HTML Checker object had a variety of properties to control its operation, as well as Begin/Append/End methods to control the parsing of incoming text. Additional read-only properties allowed retrieval of the parsed text, as well as the line count and number of spelling errors. This was implemented as a "dual" COM interface, allowing invocation by a direct interface as well as via IDispatch (needed for VBScript).

The heart of the HTML checker was a finite state machine, which was used to recognize tags as well as elements like quoted strings within a tag. Tag names were looked up in an internal hash table, to determine their properties (whether tags supported open/close syntax, whether they needed to be auto-balanced, whether they were whitelisted and in which circumstances, etc.). Tags were pushed on a stack to control their need for balancing, but the stack required an additional operation to remove the most recent of a given element, because HTML tags need not be strictly nested. (<b><i>text</b></i> is legal.) References to conferences, communities, and users required calling out to the database for lookups, done via OLE DB.

When conferencing was released in CommunityWare, the existing community had no trouble at all with it, thanks in part to all my work.

Later, when I rewrote that piece of the code in Java for Venice, the HTML Checker became a Java object behind an interface, calling on JDBC and the application's database support to access the database for verification. The spellchecker was replaced, at first, with a naive implementation that just pulled a copy of /usr/dict/words into a HashMap; later, I implemented my own trie data structure. The resulting code gave Venice the "look and feel" of CommunityWare, making the community happy.

Now, of course, for my next big project, I'm rewriting it again, in Go. The design is much the same, though specifics change. But it's really the "secret sauce" of the conferencing system, so it behooves me to get it right!

(I gave a presentation on the HTML Checker as part of my interview with Carbon Black. I relied on my notes for that presentation to put this post together.)

...or show your appreciation some other way