Wordle

Wordle is essentially a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. The Wordle applet was created by by Jonathan Feinberg (when at IBM hence Wordle has licence restrictions).

A key part of Wordle is the cue.language.jar (NB: this also used by the WordCram library). cue.language is a small library of Java code and resources that provides the following basic natural-language processing capabilities:

  • Tokenizing natural language text into individual words
  • Tokenizing natural language text into sentences
  • Tokenizing natural language text into n-grams (sequences of 2 or more words that appear next to each other in a sentence)
  • Counting strings
  • Detecting which script (alphabet, writing system) is required to represent a text
  • Guessing what language a text is in
  • Customizable “stop word” detection for a variety of languages

The original by Jonathan Feinberg (when at IBM hence licence restrictions for cue.language.jar) see github.

WordCram

WordCram is essentially a re-implementation of Wordle by Dan Bernier using Processing (and hence somewhat more customisable than Wordle). It does the heavy lifting – text analysis, collision detection – for you, so you can focus on making your word clouds as beautiful, as revealing, or as silly as you like. It does use Jonathan Feinbergs cue.language.jar to analyze the text.

Ruby WordCram

To make it even easier to to use the WordCram library in JRubyArt and propane we have created a rubygem wrapper. I had thought to recompile the libraries but currently I just extract the required jars from the WordCram distribution.

gem install ruby_wordcram

An example sketch

# This sketch shows how to make a WordCram from any webpage.
# It uses my blog
# Minya Nouvelle font available at http://www.1001fonts.com/font_details.html?font_id=59

require 'ruby_wordcram'

def settings
  size 800, 400
end

def setup
  sketch_title 'WordCram from Web Page'
  color_mode(HSB)
  background(255)
  WordCram.new(self)
          .from_web_page('https://ruby-processing.github.io/about/')
          .with_font(create_font(data_path('MINYN___.TTF'), 1))
          .with_colorer(Colorers.two_hues_random_sats_on_white(self))
          .sized_by_weight(7, 100)
          .draw_all
end

Output

More Sketches in ruby

Here we prefer File.readlines to loadStrings and use map to create an array of Word, the only unfortunate thing is that we need to cast ruby array to java array.

require 'ruby_wordcram'

def settings
    size(800, 600)
end

def setup
  sketch_title 'US Male and Female First Names'
  background 255
  names = File.readlines(data_path('names.txt')).map do |line|
    name, frequency, sex = line.split
    col = 'f' == sex ? color('#f36d91') : color('#476dd5')
    Word.new(name, frequency.to_f).set_color(col)
  end
  WordCram.new(self)
          .from_words(names.to_java(Java::Wordcram::Word)) # cast to java array
          .with_font(create_font(data_path('MINYN___.TTF'), 1))
          .sized_by_weight(12, 60)
          .draw_all
end

Reset version

require 'ruby_wordcram'

attr_reader :names, :wc, :reset

def settings
    size(800, 600)
end

def setup
  sketch_title 'US Male and Female First Names Reset'
  @names = File.readlines(data_path('names.txt')).map do |line|
    name, frequency, sex = line.split
    col = 'f' == sex ? color('#f36d91') : color('#476dd5')
    Word.new(name, frequency.to_f).set_color(col)
  end
  @reset = true
  make_word_cram
end

def make_word_cram
  # NB: see cast to java array
  @wc = WordCram.new(self)
                .from_words(names.to_java(Java::Wordcram::Word))
                .with_font(create_font(data_path('MINYN___.TTF'), 1))
                .sized_by_weight(12, 60)
end

def draw
  background 255 if reset
  @reset = false
  if wc.has_more
    wc.draw_next
  else
    puts 'done'
    no_loop
  end
end

def mouse_clicked
  make_word_cram
  @reset = true
  loop
end