[ About | Features | Music | Pictures | Software | Writing ]
Crummy: The Site
We fight 'em until we can't.


News You Can Bruise

[Archives]
Buy my books!

9 years ago: Today I realized that I should probably rewrite one...

10 years ago: Here are some fun Nethack patches, including the obsessive-compulsive...

11 years ago: YOUARE=T(YOUEAT)

12 years ago: Future Fad

[Comments] (4) The Bot of Mormon: I don't usually do in-depth analyses of my bots, especially one that's probably not gonna break ten followers, but my most recent bot is very personal to me, and the making of it turned out to be much stranger than I expected. It's The Bot of Mormon, "the most correct bot", a text-generating process with a very niche audience but the niche audience includes me, so I'm happy. A few of my recent favorites:

And again I say unto you, and more especially the elephants and cureloms and cumoms.

— The Bot Of Mormon (@TheBotOfMormon) October 16, 2014

A large and tough businessman, I pray only that I might always be found as Abraham Lincoln said: "Die when I may, by a wild olive tree."

— The Bot Of Mormon (@TheBotOfMormon) October 16, 2014

"As we read in the Book of Mormon, but I will have him come to the phone."

— The Bot Of Mormon (@TheBotOfMormon) October 14, 2014

A note: In a bid for more followers, as well as not alienating all my relatives, I designed the Bot of Mormon to be a bit of harmless humor for believing LDS folk (early versions could be pretty offensive, and I chose not to go that route). However, Saints might take offense at this blog post about how and why I made the bot. So, fair warning. Here we go.


It's not much of an exaggeration to trace my interest in generative text back to my experience growing up in Mormonism. Mark Twain famously called the Book of Mormon "chloroform in print", and I believe the reason it's so boring is that it was produced by a process similar to automatic writing. It's full of stalling and retreats to stock phrases. But what starts with the Book of Mormon sure doesn't end there. When I was a kid, church every week was a three-hour festival of stock phrases and repetition.

See, in the LDS church the task of coming up with things to say every week rotates around the general membership. Topics are assigned, and there are only about fifty topics total. Since every acceptable topic has been covered a million times before, the simplest way to make a new talk is to remember bits of old talks and mash them together.

When I was a kid I experienced this from both ends, and writing the talks was especially intense for me because despite my best efforts, I didn't actually believe. My talks were literally constructed by assembling meaningless symbols into patterns that matched what I saw other people doing. Naturally, ever since I caught the botmaking bug I've wanted to recreate this experience with a bot. I registered @TheBotOfMormon quite a while ago. But I couldn't figure out what to do until recently, when I hit upon the idea of taking as my corpus not the Book of Mormon itself, but the General Conference talks.

General Conference is a big twice-yearly event in Salt Lake where the top brass show y'all how it's done. These guys used to be lawyers and corporate executives, and their talks are all vetted by committee, so the result is... well, sometimes someone will say something offensive, but even that I wouldn't call "interesting". What is interesting is that Conference is where Mormonism meets the twenty-first century. By which I mean that's where you can see the pros use nineteenth-century language and rhetoric to talk about same-sex marriage (undesirable!) and the Internet (a mixed bag!) That's the kind of juxtaposition I thought would make a good bot. As it turns out, I was right... sort of. Eventually.

To give you a picture of what goes on in General Conference, here's a table I made of the top ten topics by decade, according to the keywords in the <meta> tags for each talk.

1970s1980s1990s2000s2010s
  1. obedience
  2. missionary work
  3. spirituality
  4. testimony
  5. Jesus Christ
  6. welfare
  7. priesthood
  8. family
  9. plan of salvation
  10. youth
  1. Jesus Christ
  2. missionary work
  3. service
  4. obedience
  5. priesthood
  6. faith
  7. love
  8. family
  9. spirituality
  10. adversity
  1. Jesus Christ
  2. faith
  3. family
  4. priesthood
  5. love
  6. service
  7. Holy Ghost
  8. obedience
  9. prayer
  10. Atonement
  1. faith
  2. Jesus Christ
  3. service
  4. testimony
  5. obedience
  6. family
  7. Holy Ghost
  8. prayer
  9. love
  10. priesthood
  1. Jesus Christ
  2. service
  3. faith
  4. priesthood
  5. obedience
  6. adversity
  7. family
  8. love
  9. Holy Ghost
  10. Atonement

You can see the shape of the fifty acceptable topics there. Anyway, I downloaded the Conference talks and set about applying my usual bag of tricks to the corpus to come up with an interesting transformation. Imagine my surprise when none of my techniques worked!

The _ebooks algorithm, up to this point an unending generator of hilarity from any corpus, failed miserably. The word-frequency filter I used to find the interesting signs for Minecraft Signs, also failed. Markov chains were useless, big surprise. I had a dim idea that the key to bot gold here was the subordinate clauses: the sentences that run on and on in a lawyerly way, embroidering themselves with their own Talmudic interpretations. I tried Queneau assembly of sentences at the clause level. This was good enough to get the bot launched, but it wasn't great. Each individual clause is very likely to be boring, its boringness has no relationship to word frequency, and combining clauses doesn't help. The corpus is fractally boring.

"Here you will find happiness, we know that the rejoicing, or anything else, they are in a state contrary to the nature of happiness."

— The Bot Of Mormon (@TheBotOfMormon) October 2, 2014

Okay, I thought, time to break out the big guns. I incorporated the Book of Mormon into my corpus, the Doctrine & Covenants; even the Pearl of Great Price, the bizarro crown jewel of the LDS canon. None of it helped. (The Pearl of Great Price helped a little—it's really weird—but it's also very short.)

Behold, and began to put heavy burdens upon their backs, and prayers of faith.

— The Bot Of Mormon (@TheBotOfMormon) October 6, 2014

But legend told of a secret weapon: the Journal of Discourses. Basically a large collection of General Conference talks from the late 19th century, during the polygamy era, containing a ton of fiery rhetoric and juicy doctrines downplayed or outright disowned by the modern church. Some might consider it dirty pool, but I was desperate to get some interesting content out of my bot. I Queneau-ified every Discourse in the Journal and added it to the corpus... to no avail. It was still dull! On the sentence fragment level, it's tough to even distinguish between the 'scandalous' stuff in the Journal and the dishwater they serve up at Conference nowadays.

And now behold, as it were, most of them in environments very different from their own.

— The Bot Of Mormon (@TheBotOfMormon) October 9, 2014

At this point I was so frustrated that I honestly started to question my unbelief. What are the odds that a corpus of text spanning hundreds of authors over nearly 200 years could be so uniformly dull? Was some divine hand at work, keeping things from getting too interesting? With shaking hands I ran my tests against a control sample: the Gutenberg text of a non-Mormon book of sermons. And it turns out nineteenth-century religious language is what's fractally boring. It's nothing to do with Mormonism in particular. The modern stuff is dull because it copies and recombines the nineteenth-century stuff.

And that, finally, was the key to what little success I've achieved with @TheBotOfMormon. When the bot is funny, the funny thing is not the rambling juxtaposition of sentence fragments per se. It's the juxtaposition of modern concepts with nineteenth-century language. To get the bot to work I would have to actually recreate that juxtaposition, not just hope for it.

Enter the Corpus of Historical American English. (Thanks, BYU! Seriously, what a great project.) This has word frequencies for every decade from the 1810s up to 2009. I picked out all the words that were 10x more common between 1930 and 1980 as they were between 1830 and 1880. I tagged all the sentence fragments that were distinctly twentieth-century. Now I can guarantee that every assemblage has an old-timey component and a more modern component, and the chances of humor go way up.

The lesson I want to take from this is that every corpus is different. I thought I could handle the LDS corpus with the same tools I use on Gutenberg, because they're both full of archaic language, but I was totally wrong. Once I engaged with the text this became obvious, but I came into this holding the text at arms' length because it held a lot of bad childhood memories.

There's no generic bot kit that will work on anything. (Well, there is, but it uses Markov chains and I don't like it.) Even my really simple bots like I Like Big Bot and Boat Names required a lot of custom behind-the-scenes work to find the most interesting subset of the data.

Perhaps this can serve as my new rule. A new bot needs to present a different way of being a bot, not just a different corpus. And adding more text to a corpus I don't know how to handle just makes the problem worse.

[Comments] (2) On Scarne On Dice: At a book sale where the deal was "$5 for all the books you can fit in a bag" I picked up a book that barely fit in the bag, Scarne on Dice, originally published in 1943 and updated in 1974. The author, John Scarne, combines a ton of genuine gambling expertise with the demeanor of a megalomaniac crackpot. The jacket copy, written by some unknown soul *cough*, describes him as "the man who made the phrase 'Acording to Hoyle' obsolete and replaced it with 'According To Scarne'". He's invented his own kind of dice, Scarney Dice®, which are normal six-sided dice except that the two face and the five face have the word "DEAD" on them.

With Scarney Dice you can play a number of games such as Scarney 3000® ("the favorite dice game of the members of the John Scarne Game Club of my hometown of Fairview, New Jersey"), Scarney Put-and-Take Dice, Scarney Duplicate Jackpots, Scarney 21 Up and Down, Scarney Bingo Dice, and Scarney Black Jack. Many of these games feature dice combinations called "Big Scarney" and "Little Scarney", or require a player to call "Scarney" when exploiting a winning position.

There are also three chapters of the book devoted to card games Scarne has invented, games like Scarney® ("the first really new card game concept of this century"), Scarney Gin, and Scarney Baccarat. These games—stay with me here—are card games, they include no dice, and they have no place in a book called Scarne on Dice, especially since John Scarne also wrote a whole other book called Scarne on Cards. But since we're going down this route, how about the family portrait in the front of the book where John Scarne poses with his wife, his son, his books, and the board games he invented, most notably a checkers-like thing called Teeko. Did I mention that he named his son after his board game? Oh, and after himself, of course. John Teeko Scarne.

But unlike every other person like this I've ever encountered, John Scarne actually knows his stuff. He convincingly debunks parapsychology dice-rolling experiments by contrasting the way the experiment was run with the way casinos handle dice. He explains ludicrous systems for beating the casinos and then explains why they're mathematically impossible. His chapters on how to spot loaded dice, rigged games, steer joints, and general cheating are clearly a light rewrite of the lectures he went around giving on Army bases to stop GIs losing their paychecks to craps hustlers. He has a convincing description of what it would take to run an underground gambling operation, down to a detailed payroll.

What is going on here? My initial guess was that gambling is a field where being a Jeffrey Lebowski-esque blowhard is tolerated and even encouraged. That's still my primary guess, actually. But after reading the most interesting 200 pages of this massive tome and skimming the rest I I wonder if something else is going on. This book is mostly about craps, a folk game with a relatively clear origin in Hazard but no real chain of custody between its origin and the modern day. Maybe Scarne just wants to make damn sure that his contributions to ludology are properly credited. Unfortunately, his habit of naming everything after himself just made it that much easier to ignore his innovations and play the same games people have been playing for hundreds of years.

But there was one game that John Scarne invented whose genius I appreciate, even though I'll never play it. It's a drinking game called Scarney Pie-Eyed Dice and it survives in a modified version called Twenty-One Aces. Scarne describes a couple variants but here's the simplest one: in Scarney Pie-Eyed Dice the players take turns rolling two dice until someone rolls nothing but twos and fives (these are the "DEAD" faces of official Scarney dice). The first person to accomplish this orders a drink. Scarne recommends "a double rye with celery tonic, vodka with chili sauce", or something equally weird. The second person to roll twos and fives drinks the drink, and the third person to roll twos and fives pays for the drink.

That's just great. It creates two types of tension at once—who's going to drink the drink and who's going to pay for it, and it uses creativity from an unrelated field as a game mechanic. Good job.

[No comments] September Film Roundup: A whopping two films this month, at best. October's also not looking great. For movies, I mean. Everything else looks pretty good.

[Comments] (1) The Minecraft Geologic Survey: I've been waiting for all the pieces to go into place before writing about this on NYCB, and now the pieces are in place. The lightning strikes my castle laboratory and the Minecraft Geologic Survey rises! (See Fig. 1.)


Fig. 1

Back in May I announced that I'd downloaded 65,000 Minecraft maps from the official Minecraft forum, and used the data to make my @MinecraftSigns bot. Later I took over Allison Parrish's defunct @minecraftebooks and revitalized it with _ebooks-style quotes from the books found in Minecraft worlds. (Plus, as of a few days ago, command block outputs that incorporate the names of followers, Exosaurs-style.)

But all the while, in the background, I was downloading. Worlds, screenshots, mods, player skins, texture packs... everything with a URL. I ended up with about two terabytes of data, an amount that here in 2014 is not difficult for me to store but is very difficult to transfer or process.

To get the signs and the books for my bots, I had to load every Minecraft world into Python and go through every chunk looking for entities. I ended up with about 180,000 worlds, and iterating over them all was a very time-consuming process. Fortunately, I had two more projects that would amortize all that computer time.

Both projects required that I take "core samples" of each world, extracting individual chunks that were likely to be interesting and forming a new world (like the one pictured above) containing only those chunks. The resulting dataset is representative of the full more-than-a-terabyte package of original worlds, but because it's just a very tiny sample, the whole thing weighs in at a comparatively slim 12 gigabytes.

That's small enough to go on the Internet Archive, and small enough for you to download it and use it in your own project. I wrote a detailed guide to the data, which includes not only 170,000 synthetic Minecraft worlds but a big JSON file (also available on its own) containing all the metadata and sign text and other things you'd need to do a text-based project.

The other project is The Reef, a series of Minecraft maps that combine the chunks obtained from the survey into mashup maps that incorporate designs from many different authors. For instance, you've got The Reef #1, which sticks spawn chunks from 10,000 different maps together to form a (mostly) naturally-sprawling terrain. Or maybe you'd prefer the Skyburbs, a thousand Skyblock maps jammed next to each other.

I've got plenty more ideas for Reef maps, but now that the data is available I think this is a good point to put the project on pause for a while. I will be publishing the code I use to make my Twitter bots and the Reef maps, to encourage you to play with the data and do your own thing.

I'm concerned about the Minecraft servers that have been shutting down since Mojang changed their EULA to include strict rules on monetization. People have been giving a lot of attention to the Microsoft buyout, but the EULA change is what's affecting servers right now. I would really like to offer an archive service for Minecraft servers that are being shut down (plus just original worlds that people have lying around on their hard drives), but I don't see a good way to get the word out. It's not like the typical Archive Team project where you can go into a server that's shutting down and download everything. The server owner has to take the initiative. Also bandwidth and storage become a problem for me at this point. So this is more of an open question than something I know how to solve. It may not get solved.

August Film Roundup: Another month full of major progress on major projects, but I managed to squeeze in four features:

[Comments] (4) Month of Crowdfunding 2014!: After taking a break last year because I didn't have a steady paycheck, Month of Crowdfunding (né Month of Kickstarter) has returned! (2011) (2012) Here's how it works: every day in August I will pledge to some crowdfunding project or another. Yes, that's pretty much it.

Unlike previous years, I will not be doing writeups of each project I back, because I am in the middle of novel revisions. I will just edit this post every day with a brief update. I will also not be trawling the crowdfunding sites every day looking for quirky, offbeat projects. That worked in 2011 when Kickstarter was very small, and it worked in 2012 because I created special software tools for making it work. This year, I will rely heavily on a revolutionary new concept I call crowdnepotism.

Here's how it works. If your friend has a crowdfunding project or Patreon that you want me to support, or you've backed a project and you'd like me to back it as well, please let me know through a comment on this post, a message to @leonardr on Twitter, or an email to leonardr@segfault.org. Please do not tell me about your own project. Tell me about anyone's project but your own. The true meaning of Month of Crowdfunding is found in focusing on other people. That's the only limitation. If you say it's okay, I'll mention you as the person who suggested the project to me in the list below.

Speaking of which, the list below. The projects backed so far:

  1. The Ashville Blade - Supporting the journalism of a friend of Sumana's.
  2. "A History of Mobile Games: 1998-2008" - Just seems like a cool book.
  3. Dj CUTMAN, creator of a chiptune podcast that I listen to at work.
  4. "An Alphabet of Embers", an anthology edited by Shweta and suggested by Zack.
  5. Designers and Dragons, a "comprehensive, four-volume history of the roleplaying game industry." (Found via @CrowdBoardGames and unknowingly ratified by Jim Henley.)
  6. Ninja Pizza Girl, "a serious game about bullying, emotional resilience – and pizza delivering ninjas", suggested by Nathaniel.
  7. Andrea Phillips's writing
  8. Epanalepsis, a graphical adventure game.
  9. Ben Briggs' chiptunes.
  10. Jenny LeClue, another graphical adventure, suggested by Andy Baio.
  11. Mia S-N's illustrations, suggested by Sumana.
  12. Accessing the Future, an SF anthology.
  13. Stretching the notion of "crowdfunding", I sent some money to Saladin Ahmed, who just had his basement flood.
  14. The games of Anna Anthropy.
  15. The games of Avery Mcaldno.
  16. Tree Climbing for Climate Change Research
  17. Why the long face? Functional morphology of a unique fossil porpoise
  18. #OperationHelpOrHush
  19. Legends of Beforia, a card game prototype by #botALLY Patrick Rodriguez.
  20. Kris's comics, yay. (Not suggested by Kris.)
  21. African Skies: Establishing an Observatory for Students in Ghana
  22. I think the name of this project is too corny to say. It's a butter knife that works like a cheese grater.
  23. [Yeah, having troubles keeping this up to date, sorry.]
  24. Noisebridge reboot
  25. Critical Distance
  26. Dawn of the Algorithm (suggested by Mike Mongo)
  27. MS treatment for Paul Jessup, suggested by Saladin Ahmed, paying it forward.

As with the previous two Months, my daily budget is $25 or whatever it takes to get a cool reward. That corresponds to a $2 monthly Patreon pledge. And don't forget, crowdnepotism is a registered trademark of... what, now there's paperwork for registering trademarks? Screw that.

Final update: As you can tell this was a bit of a disaster, consistency-wise. I would frequently leave MoC for days at a time and have to go back and backfill, and near the end I gave up. So I think I'm done with the MoC "tradition". Not because there's not cool stuff on crowdfunding sites (there's a ton of it) but because I'm busy with other stuff now, and "back a project every day for a month" is no longer the interesting experiment it was in 2011. Even going through the science crowdfunding sites and funding science experiments became a bit of a chore given all the other stuff I have going.

I've also discovered that backing a bunch of projects gets me stuff, and I've already got more stuff in my life than I'd like. So I'm going to keep on with my rest-of-the-year strategy of using crowdfunding sites like a normal person.

[Comments] (3) July Film Roundup: I saw most of these movies on airplanes, and I have no regrets. Not about that, anyway.

[Comments] (2) The Average Minecraft Skin: Currently my two spare-time hobbies are 1) Situation Normal revisions and 2) gathering Minecraft data. Yes, I'm still at it! There's a lot more data than I anticipated! I'm up to about 175,000 maps, and I've branched out into archiving mods and texture packs. There's even more I could do, but pretty soon I'm going to have to put away the data-gathering part of this project for six months or a year so I can get other stuff done.

My reach keeps expanding because whenever I decide that a certain dataset isn't interesting and I won't bother with it, I immediately come up with something really cool to do with the dataset. For instance, Minecraft skins, the little images that are bitmapped onto your character in the game to make you look like a penguin or Jean-Luc Picard. I never really cared much about skins, but in the process of deciding not to bother with them, I discovered that Planet Minecraft, one of the biggest repositories of skins, lets someone who uploads a skin specify a gender ("male", "female", "interchangeable", and "other"), as well as a category classification ("animal", "cartoon", "famous person", etc.). Now I was interested! Skins are data about how people present themselves in the virtual world, data that I could gather and graph.

Here's a simple graph showing the skins available on Planet Minecraft, broken down by category and gender:

Self-reported gender of Minecraft skins

In every category male skins are drastically overrepresented, but the discrepancy is smallest in "Other". Why? My guess is that "Other" is where you'd put a skin that you made to represent yourself.

Since there are only two different sizes for skin images, you can average a number of skins together to get a new skin. Here's a skin that is the average of 100 of the most popular "female" skins on Planet Minecraft:

And here's the average of 100 of the most popular "male" skins:

That's a pretty preliminary result, but I think it's interesting. The major sexual dimorphism among Minecraft skins—the shape of the eyes—comes through loud and clear. If you want to use one of these as your actual Minecraft skin, I recommend going in with an image editor and erasing the upper-right part of the image. Otherwise your character's head will be shrouded in a ghostly hat, and it won't look good.

June Film Roundup: It doesn't get better than this. I liked every single movie I saw this month. Two, maybe three of them are in my top ten. I guess that's what happens when you only see time-honored classics and movies you've already seen and loved. I'm posting this a little early because I'm going on vacation next week. Have fun!

May Film Roundup: Ready for "Wacky Wednesdays" here at News You Can Bruise? Here's the deal. We got five movies in the May roundup, but only three of them I actually saw in May! One is from April and one I saw yesterday. Also, it's not Wednesday.

@MinecraftSigns, And Minecraft Maps: I finished a draft of Situation Normal and sent it in to writing group, so I've now got time to reveal the other non-NYPL project that's been taking up all of my time. Ta-da! It's a bot! @MinecraftSigns posts signs that I found in Minecraft maps using the pymclevel library I learned for the Historical Minecraft project.

For a long time, signs were the only form of textual self-expression possible in Minecraft. You get four lines of 15 characters each. In normal play they're generally used as labels or signposts. Custom mapmakers also use them for instructions to the player, dialogue, narration, and hidden messages. They are a medium of communication with more severe character restrictions than Twitter, which makes them a great subject for a Twitter bot. Signs posted so far range from the profound:

This one's
about dropping

To something I think I saw on one of those trendy t-shirts recently:

peanuts and
pickles and
potatoes and
Paul

To the crowd favorite so far:

Do not
Extinguish fire
You will lose.

Oh goodie, you say; another bot from Leonard! What will he come up with next? Yet another bot? The answer is yes. But, before you dismiss @MinecraftSigns as just another window into a beautiful realm of found poetry, ask yourself this: how did I get this data in the first place? Where did all these Minecraft signs come from? Oh, I don't know, maybe from the sixty-five thousand Minecraft maps I've got on my hard drive?

That's right. After the Historical Minecraft project I thought back to late 2011 when I was enjoying the world of custom Minecraft maps. I then thought forward to early 2012, when I was kind of done with custom Minecraft maps, but when I moved all the ZIP files I'd downloaded onto a backup drive rather than deleting them, because these things don't stay on the Internet forever and it would be nice to have a copy, say, twenty years from now. And then, in early 2014, two years into that twenty, I was thinking about that little act of preservation and it hit me: who's archiving the rest of those maps?

The answer was: apparently nobody. And then the answer quickly became: I am. From the middle of April to the middle of May I archived 65,000 maps linked to from the Minecraft maps forum. That's out of about 100,000 maps total. I verified that 25,000 maps are gone, and there are about 10,000 maps I didn't get because they're scattered across a million different file-sharing sites.

So, at least a quarter of the maps put up since 2010 are already gone. I was able to get screenshots for a lot of the missing maps, so it's not a total loss, but that's still really bad, and not only because it's generally bad when interesting things leave the Internet.

Minecraft is the medium used by a lot of accomplished designers and artists. The most obvious examples IMO are Vechs (Super Hostile) and three_two (Vinyl Fantasy). Those two are pretty legendary and their maps are in no danger of being lost, but there's a lot of really great stuff published in 2011-2012 that was lost in the flood. 2011-2012 was the silent-film era of Minecraft custom maps, when the genres were being defined and the first wild experiments were happening, but when the medium was not taken seriously enough to warrant systematic preservation. In the future we'll have tools for finding the overlooked gems, but first those maps have to make it to the future.

Speaking of the future, Minecraft is the training ground for the next generation of game designers, the way ZZT was the training ground for my generation. There's a ZZT archive; it's got about 2,000 ZZT games. How many are lost? Sure would have been nice to save more of them, but all we had back then was BBSes. We didn't have a big official "ZZT forum" with a special place for posting links to your games.

Finally, even a map that's made by a young child who grows up to be an actuary rather than a game designer is valuable. For one, it's valuable to the actuary. I didn't grow up to be a visual artist, but I value this awful, mysterious poster I drew when I was six. That poster would be long gone if someone (my mother) hadn't archived it for me. Second, these maps might be useful in the aggregate as a source of information about period slang or the way children visualize three-dimensional space. Third...

Well, I think one reason Minecraft is so popular with kids is it recreates an experience that American kids generally aren't allowed to have anymore: going outside and playing in a semi-natural environment, on your own or with friends, without parental supervision. There's this infamously bad Minecraft map from 2011 called Quest for Gallell, which turned out to be made by a six-year-old. Presumably this goofy swashbuckling playthrough was made before the players knew they were making fun of a six-year-old's map, but if you watch the video you'll notice that the players understand how to approach the map: like kids playing together in the woods. They're acting out kids acting out adults.

Quest for Gallell is the three-dimensional record of an imaginative play session, which you can play through yourself if you want. It sucks that kids can't play outside anymore, but at least we have some records of what they do instead. Those records are worth saving.

Crosspost: Apparently I have a new weblog! It's my NYPL staff weblog and I've put up a post about a project I worked on with Paul Beaudoin on like my second day at NYPL Labs. We turned a historical contour map into a Minecraft world. This is cool on its own, but it also means I now know how to programmatically generate Minecraft maps with Python scripts. The possibilities are endless, and you'll be seeing more of them later. Like, when I'm done with this novel.

If you must get all your Minecraft news in video form, you're surprisingly picky but you're also in luck. I took Nashville's own Joe Hills on a tour of 1860 Manhattan, and he recorded the whole thing. My only regret is that I didn't prime the buried TNT he discovers near the end of the video.

[Comments] (1) April Film Roundup: Running late this month because of work on Situation Normal. But I'm sick of writing that tonight, so let's crank out some great reviews of (mostly) great movies.

[Comments] (1) March Film Roundup: April Fools! As part of an elaborate prank spanning over a year I have slowly turned NYCB into mostly a film review blog! Hahahahaha... ah...

Anyway, I'm trying out a new strategy for spending less time writing these film roundups. Instead of trying to analyze each movie in detail I'm going to write only as much about a movie as I feel like writing in the moment. Sometimes this will still be a lot, but most of the time I think a paragraph's worth of text will suffice.

Read My Lips: Two New Bots: I've been trying to finish as much of Situation Normal as possible before my job at the library starts (uh... I think this is the first time I've mentioned my NYPL job on NYCB, but I'll be writing about it later). But I have created two new autonomous agents to engage and confound you.

The first is Euphemism Bot, inspired by the fact that most of the output of Adam's Egress Methods sounds like weird euphemisms for masturbation. Euphemism Bot elevates the tone by putting out weird euphemisms for all sorts of dirty, shameful things. You'll never be understood again! It's been up for about a month, and it's already subverted its programming.

From the naughty to the nautical, there's also Boat Names, which I "launched" today. It periodically sends out names that one, and only one, person decided to give their boat. The data comes from the Queneau-sounding ten thousand boat names, which I first learned of from the trivia podcast Good Job, Brain! (I'm linking to their Twitter page because their main webpage currently shows some base64-encoded text that isn't even a puzzle.) I had this idea kicking around in my head until yesterday's lunch with Andrea Phillips, when the topic turned to weird random datasets we'd collected. And now... a bot is born.

Boat Names also has an Egress Methods connection. I found the list of given names Adam uses for Egress Methods and used it to filter out boats that are named after people. This avoids the boredom of "Eleanor", which just proves that not many boat owners have wives named Eleanor.

February Film Roundup: Three films this month, none of them great, but all of them worth your time.

Mahna Mahna: My new bot, Mahna Mahna (@mahna____mahna), reenacts the Muppet Show's "Mahna Mahna" skit over the course of a day. It might be my saddest bot.

My secret is that I created this bot hoping that someone else would eventually create a Snowth bot to enact the other half of the skit. I quickly learned that there is already a Snowth bot, but it only talks to @mahna____mahna once a day. So... well, I already revealed one secret in this paragraph, I shouldn't reveal another.

Constellation Games Bonus Story Ebooks: Thanks to requests by Ron Hale-Evans and others at Foolscap, I've compiled the four Constellation Games bonus stories into a single ebook. You can get an EPUB that looks okay and a MOBI that's kinda ugly. If you want to do a better job of formatting, then a) be my guest, and b) let me know and I'll send you the original source files, which should save you some work over downloading everything and putting it together yourself.

Writing Aliens: I've put online the slides and prepared text of my Foolscap talk, "Writing Aliens", or, "Duchamp, Markov, Queneau: A Mostly Delightful Quilt". On one level it's a simple introduction to algorithmic creativity, but it's also about creativity in general, the anthropomorphization of software, and why the features that make Twitter so aggravating for humans make it such a great platform for bots. Bonuses include a recap of Brian Hayes's article on Markov and a telling of the @Horse_ebooks saga as a reverse alien invasion. The ebooks installation

The two site-specific installations that I hinted at earlier were custom scripts displaying variants on Ebooks Brillhantes and Hapax Hegemon. The text corpus comes from a scrape of everything linked to from Free Speculative Fiction Online. The software is a heavily modified version of Bruce, modified a) to stream data from a flat text file and create the slides on the fly, instead of trying to load 20,000 slides into memory at once; and b) when restarted after a crash/shutdown, to skip the appropriate number of slides and pick up where it would be if it had been running continually.

Unfortunately I never got a picture of both displays running side-by-side; if you have such a picture, I'd really appreciate it if you could send it to me.

Just after I set up the ebooks display, I met Greg Bear, who was at Foolscap running a writing workshop. We walked over to the screen and I explained the project to him. He said "I'd better not be in there." AT THAT MOMENT the screen was showing the quote "We zoomed down eleven" from this free sample of Blood Music. It was pretty awkward.

[Comments] (1) January Film Roundup: The cycle begins anew... OR DOES IT? Check out all the films I saw in January!

Yeah, only one film! Because I was travelling all month. I couldn't even count Future Love Drug, a short film made by my fellow Foolscap GoH Brooks Peck, because I came in late and only saw the last minute of the film.

I don't know if the film roundups will continue in 2014. On the one hand, I'm going to try to see, or at least review, fewer films in 2014 so I can do more reading. On the other hand, I love taking fiction apart to see how it works, and reviewing books the way I've been reviewing movies is a good way to make professional enemies. Whereas nobody cares what I say about film. So who knows?

[Comments] (1) The Crummy.com Review of Things 2013: I've been travelling for most of the month, but I managed to scrape together a year-in-review post. Here's 2012. I'm a little disappointed right now, because I just woke up from a dream in which I'd savvily combined several middle-tier Kickstarter rewards into being able to go to the International Space Station whenever I wanted, so let's start with a self-aggrandizing montage of my waking accomplishments in 2013:

Now let's take a brief look at contributions from the not-me community:

Literature: The category that suffered the most from 2013's focus on film. I didn't read that much, and my writing is slowing down because of it. This is a strange alchemy that I can't explain but I'm pretty sure other writers recognize it. Anyway, I've got some new books I'm excited about so I'll get back on this in 2014.

For 2013 I'll give the nod to Marty Goldberg and Curt Vendel's Atari Inc.: Business is Fun, a book that... well... this review is pretty accurate, but the book has a lot of good technical and business information, plus many unverifiable anecdotes. It seems I read nothing in 2013 that I can wholeheartedly recommend without reservation... except Tina Fey's Bossypants, I guess... yes! In a late-paragraph update, Bossypants has taken the award! Wait, what's this? In a shocking upset, the ant has taken it from Bossypants! Yes, the ant is back, and out for blood!

Games: 2013 was the year I finally learned the mechanical skill of shuffling cards. Maybe this doesn't seem like a big deal to you, but I've been trying to figure this out for most of my life.

The crummy.com Board Game of the Year is "Snake Oil", a game about fulfilling user stories with lies and shoddy products. The Video Game of the Year? Man, I dunno. I'm playing computer games a little more than in 2013, but still not that many. "Starbound" is really cool, and is probably the closest I'll get to being able to play "Terraria" on Linux.

Audio: As I mentioned, I'm travelling, and away from the big XML file that contains my podcast subscriptions, so I'll fill this in later, but there's not a lot new here. But I can tell you the Crummy.com Podcast of the Year: Mike "History of Rome" Duncan's new podcast, Revolutions. The first season, covering the English Revolution, just wrapped up, so it's a good time to get into the podcast.

Hat tip to Jackie Kashian's The Dork Forest. Probably not going to have to update this one, actually.

Film: Ah, here's the big one. As I mentioned earlier, I saw 85 feature films in 2013. By amount of money I spent, the best film of the year was Gravity, which I dropped about $40 on. But by any other criteria, it wasn't even close! Well, it was close enough to get Gravity onto my top twelve, which I present now. I consider all of these absolute must-watches.

  1. The General (1926)
  2. Nashville (1975)
  3. Ishtar (1987)
  4. Ball of Fire (1941)
  5. Calculated Movements (1985)
  6. The World's End (2013)
  7. No No Nooky TV (1987)
  8. Gravity (2013)
  9. The Godfather (1972)
  10. Cotton Comes to Harlem (1970)
  11. Gentlemen Prefer Blondes (1953)
  12. No (2012)

As you can tell, only films I saw for the first time in 2013 are eligible; we call this the "The Big Lebowski rule".

There was no movie that really changed my aesthetic sense this year, the way Celine and Julie go Boating did last year, but Nashville gave me insight into managing a large ensemble cast. Hat tip to Fahrenheit 451 for getting me to understand why I keep lining up for French New Wave films even though they keep pulling the football away from me.

I still don't feel like I know that much about film. I treat films like they're books. I'm not that interested in what people do with the cameras. I have no idea what the names of actors are. I find the prospect of making a film quite tedious. They're fun to watch though.

For the record, here's my must-see list from 2012, which I didn't spell out last time:

  1. Celine and Julie Go Boating (1974)
  2. Brazil (1985)
  3. A New Leaf (1971)
  4. All About Eve (1950)
  5. The Whole Town's Talking (1953)
  6. Shadow of a Doubt (1943)
  7. Paper Moon (1973)
  8. Marathon Man (1976)

Okay, I think that's enough. Nobody reads these things until the centennial anyway.

One week to Foolscap!: In a week I'm a guest of honor at the Foolscap convention in Redmond, WA. It's got a bit of an unconference feel, so apart from the basics--board game night, a talk by me that I have to prepare--we can form fluid overlays and schedule whatever we want.

Also featured at the con will be (I think I've mentioned this before) two continuous SF/F text installations I've created to astound you. This exhibit WILL NOT BE REPEATED, unless someone asks for it at another con. So if you're in the Seattle area, sign up or just show up the day of, and you'll get to hang out with me, and the other honored guest, museum curator/SyFy monster movie screenwriter Brooks Peck.

[Comments] (1) The Bots of 2014: I took an oath of non-bot-making for most of December, but now I'm back in the game. At the end of January I'm a guest of honor at Seattle's Foolscap convention, and I've got a couple site-specific installation projects that will hopefully entertain congoers to the exclusion of all other activities.

But for now, I have two new bots to entertain you, the general public. The Hapax Hegemon (@HapaxHegemon) posts words that occur only once in the Project Gutenberg corpus I've been getting so much mileage out of. So far it's emitted such gems as "zoy", "stupidlike", and "beer-swipers". And like so many of my recent bots, it won't stop until we're all dead.

My second new bot is the Serial Enterpreneur (@ItCantFail), which posts inventions. It's basically playing Snake Oil (spoiler: Crummy.com 2013 Board Game of the Year) with a much larger corpus, derived from the Corpus of Historical American English and the Scribblenauts word list.

So far my favorite @ItCantFail inventions are the delicious Fox Syrup, the liberal-friendly Left Drone, and the self-explanatory Riot College. Write in with your own wacky inventions! I won't use them, because that's not how this bot works, but it seems like a fun way to kill some time.

More bots are on the way! But not for a while, because I gotta do novel work and get the Foolscap-exclusive bots in shape.

December Film Roundup: Counting it all up, it looks like I saw 85 feature films in 2013, plus some beefy television and a ton of shorts. Unfortunately the retrospective of 1913 silent film (semi-promised at 2012's 1912 retrospective) did not materialize. Oh darn!

I'll tackle the "best of" topic in a general 2013 wrap-up later on. For now, here's a look at December's cinematic adventures:

I'm planning on seeing a lot of movies in 2014, but I don't know if I'm going to write these detailed reviews of each one. It takes a long time to get my thoughts in order and write it down, and, as you'll see when I write the year-end roundup, it really eats into the time I spend enjoying other media. So until next time, I'll see you at the movies! (If you are Sumana, Hal, or Babs.)

[Comments] (3) Markov vs. Queneau: Sentence Assembly Smackdown: I mentioned earlier that when assembling strings of words, Markov chains do a better job than Queneau assembly. In this post I'd like to a) give the devil his due by showing what I mean, and b) qualify what I mean by "better job".

Markov wins when the structure is complex

I got the original idea for this post when generating the fake ads for @pony_strategies. My corpus is the titles of about 50,000 spammy-sounding ebooks, and this was the first time I did a head-to-head Markov/Queneau comparison. Here are ten of Markov's entries, using the Markov chain implementation I ended up adding to olipy:

  1. At Gas Pump!
  2. The Guy's Guide To The Atkins Diet
  3. Home Internet Business In The World.
  4. 101 Ways to Sharpen Your Memory
  5. SEO Relationship Building for Beginners
  6. Gary Secrets - Project Management Made Easy!
  7. Weight Success
  8. How get HER - Even If It's Just Money, So Easy and Effective Treatment Options
  9. Sams Yourself
  10. Define, With, Defeat! How To Get Traffic To Your Health

The Markov entries can get a little wacky ("Define, With, Defeat!"), which is good. But about half could be real titles without seeming weird at all, which is also good.

By contrast, here are ten of Queneau's entries:

  1. Adsense I Collection Profits: The bottom Guide Income!
  2. Reliable Your Earning Estate Develop Home And to life Fly Using Don't Your Partnership to Death
  3. Help the Your Causes, Successfully Business Vegetarian
  4. Connect New New Cooking
  5. 1 Tips, Me Life Starting to Simple Ultimate On Wills How Years Online With Living
  6. How Practice Health Best w/ Beauty
  7. Amazing Future & Codes Astrology to Definitive Green Carbs, Children Methods JV Engine Dollars And Effective Beginning Minutes NEW!
  8. I and - Gems Secrets Making Life Today!
  9. Succeeding For Inspiring Life
  10. Fast Survival Baby (Health Loss) Really How other of Look Symptoms, Your Business Encouragement: drive Health to Get with Easy Guide

At their very best ("Suceeding For Inspiring Life, "How Practice Health Best w/ Beauty"), these read like the work of a non-native English speaker. But most of them are way out there. They make no sense at all or they sound like a space alien wrote them to deal with space alien concerns. Sometimes this is what you want in your generated text! But usually not.

A Queneau assembler assumes that every string in its corpus has different tokens that follow an identical grammar. This isn't really true for spammy ebook titles, and it certainly isn't true for English sentences in general. A sentence is made up of words, sure, but there's nothing special about the fourth word in a sentence, the way there is about the fourth line of a limerick.

A Markov chain assumes nothing about higher-level grammar. Instead, it assumes that surprises are rare, that the last few tokens are a good predictor of the next token. This is true for English sentences, and it's especially true for spammy ebook titles.

Markov chains don't need to bother with the overall structure of a sentence. They focus on the transitions between words, which can be modelled probabilistically. (And the good ones do treat the first and last tokens specially.)

Markov wins when the corpus is large, Queneau when the corpus is tiny

Consider what happens to the two algorithms as the corpus grows in size. Markov chains get more believable, because the second word in a title is almost always a word commonly associated with the first word in the title. Queneau assemblies get wackier, because the second word in a title can be anything that was the second word in any title.

I have a corpus of 50,000 spammy titles. What if I chose a random sample of ten titles, and used those ten titles to construct a new title via Queneau assembly? This would make it more likely that the title's structure would hint at the structure of one or two of the source titles.

This is what I did in Board Game Dadaist, one of my first Queneau experiments. I pick a small number of board games and generate everything from that limited subset, increasing the odds that the result will make some kind of twisted sense.

If you run a Markov chain on a very small corpus, you'll probably just reproduce one of your input strings. But Queneau assembly works fine on a tiny corpus. I ran Queneau assembly ten times on ten samples from the spammy ebook titles, and here are the results:

  1. Beekeeping by Keep Grants
  2. Lose to Audience Business to to Your Backlink Physicists Environment
  3. HOT of Recruit Internet Because Financial the Memories
  4. Senior Guide Way! Business Way!
  5. Discover Can Power Successful Life How Steps
  6. Metal Lazy, Advice
  7. Insiders Came Warts Weapons Revealed
  8. 101 Secrets & THE Joint Health Than of Using Marketing! Using Using More Imagine
  9. Top **How Own 101**
  10. Multiple Spiritual Dynamite to Body - To Days

These are still really wacky, but they're better than when Queneau was choosing from 50,000 titles each time. For the @pony_strategies project, I still prefer the Markov chains.

Queneau wins when the outputs are short

Let's put spammy ebook titles to the side and move on to board game titles, a field where I think Queneau assembly is the clear winner. My corpus is here about 65,000 board game titles, gathered from BoardGameGeek. The key to what you're about to see is that the median length of a board game title is three words, versus nine words for a spammy ebook title.

Here are some of Markov's board game titles:

  1. Pointe Hoc
  2. Thieves the Pacific
  3. Illuminati Set 3
  4. Amazing Trivia Game
  5. Mini Game
  6. Meet Presidents
  7. Regatta: Game that the Government Played
  8. King the Rock
  9. Round 3-D Stand Up Game
  10. Cat Mice or Holes and Traps

A lot of these sound like real board games, but that's no longer a good thing. These are generic and boring. There are no surprises because the whole premise of Markov chains is that surprises are rare.

Here's Queneau:

  1. The Gravitas
  2. Risk: Tiles
  3. SESSION Pigs
  4. Yengo Edition Deadly Mat
  5. Ubongo: Fulda-Spiel
  6. Shantu Game Weltwunder Right
  7. Black Polsce Stars: Nostrum
  8. Peanut Basketball
  9. The Tactics: Reh
  10. Velvet Dos Centauri

Most of these are great! Board game names need to be catchy, so you want surprises. And short strings have highly ambiguous grammar anyway, so you don't get the "written by an alien" effect.

Conclusion

You know that I've been down on Markov chains for years, and you also know why: they rely on, and magnify, the predictability of their input. Markov chains turn creative prose into duckspeak. Whereas Queneau assembly simulates (or at least stimulates) creativity by manufacturing absurd juxtapositions.

The downside of Queneau is that if you can't model the underlying structure with code, the juxtapositions tend to be too absurd to use. And it's really difficult to model natural-language prose with code.

So here's my three-step meta-algorithm for deciding what to do with a corpus:

  1. If the items in your corpus follow a simple structure, code up that structure and go with Queneau.
  2. If the structure is too complex to be represented by a simple program (probably because it involves natural-language grammar), and you really need the output to be grammatical, go with Markov.
  3. Otherwise, write up a crude approximation of the complex structure, and go with Queueau.


This document (source) is part of Crummy, the webspace of Leonard Richardson (contact information). It was last modified on Monday, September 09 2013, 18:05:52 Nowhere Standard Time and last built on Saturday, November 01 2014, 03:05:02 Nowhere Standard Time.

Crummy is © 1996-2014 Leonard Richardson. Unless otherwise noted, all text licensed under a Creative Commons License.

Document tree:

http://www.crummy.com/
Site Search: