Removing Excessive Quotes

Last week I turned my attention to the excess quoting in the archive. Quotes are useful when the referenced text is immediately unavailable but in our archive the original is immediately above the response and has just been skimmed over by the reader.

I have learnt a little too much about quoting recently. Specifically the excessive quoting that appears in e-mails and responses to usenet posts.

Before quote reduction

After quote reduction

Sleep Centre is DssW’s massive archive of usenet and other conversations about Mac energy saving subjects. The entire archive is available on this web site and every so often I turn my attention to improving its presentation and usefulness.

Last week I turned my attention to the excess quoting in the archive. Quotes are useful when the referenced text is not immediately available, but in our archive the original is immediately above the response and has just been skimmed over by the reader. Bulk quoting in this situation is unhelpful — needing to bulk quote at any time suggests your contribution could be better formed.

I continue to struggle with how far DssW can edit and reformat the Sleep Centre content. My goal is to improve the content’s usefulness. To that end some reformatting is justified. Wholesale editing is not.

With regard to excessive quotes, bulk or automated quote removal risks changing the meaning of the author’s message. Removing quotes by hand is not feasible; a smart automated method is a must — or the excess quotes must remain.

The Sleep Centre available today has the majority of quotes removed. In most cases the improvement is incredible.

Discussions are less cluttered and feel easier to browse. It feels more like a real conversation.

How I decided which quotes to remove will be covered shortly, but other recent changes also improved readability.

Visual clutter

The removal of excess quoting reduces the occurrence of colour changes and indented blocks.

Quoted text

Sleep Centre’s quotes are denoted with a change in text colour, a slight indent, and a thin line to the left of the quote. If the quote appears within another quote — as is common with discussions — all these attributes are repeated within the original quote. The result is a mass of visual clues requiring effort from the reader.

The author colour scheme was introduced to help readers gauge who was talking. The goal was to allow readers to quickly judge how many people were involved in a discussion and who the main contributors were.

It felt noisy.

The result was colourful, but ultimately added little to the content. It did not get readers an answer any faster.

What matters to the reader?

Denoting who is speaking is still important, but not critical to the casual reader. Sleep Centre’s audience are looking for solutions to problems with their Macs.

I does not matter if a message is written by Gregory Weston, Mike Rosenberg, or Jolly Roger. It does not matter if this is the fifth message left by a specific author. It does not matter if one author is dominating the conversation. It does not matter if the conversation involves two people or twenty. What matters is solving the reader’s problem.

With this in mind I adjusted the layout to improve the reader’s ability to skim the content.

The new layout highlights a technical tweak Sleep Centre gained a few months ago. Messages have been reformatted from fixed width content into flexible paragraphs. There are very few specific line breaks in Sleep Centre’s content.

The content resizes, wraps, and acts as simple blocks of text. While this does not sound impressive, it is, and the new thinner layout makes good use of this capability.

Removing the quotes

So how did I decided which quotes to remove?

As a background task, I have been pondering the problem for many months. The problem felt overwhelming and so I flipped question to something more manageable.

How to decided which quotes should remain?

This question felt easier and arrived at the same result. Removing the quotes is an easy computer science task. The process of deciding which quotes to leave in is not.

In the end I based the solution on an approach taken by spam filters; apply a series of tests and total the weighted scores. If the final score falls below a set threshold, include the quote.

The tests include:

  • the position of the quote relative to the author’s contribution;
  • the length of the quote;
  • the number of quotes;
  • the proportion of quoted material.

The outcome was pleasantly surprising. I had a lot less tweaking to the algorithm than initially expected.

The framework lets me add new tests and tweak the various weights associated with each test. Thus as I can incrementally improve the results as time allows.

I suspect further fine tuning will be required but for now the result is a leap forward in the usefulness of Sleep Centre.