A Simple Name Generation Method Based On Data

So, you want to generate names for some procgen project and don't quite want to spend much too long on coming up with or tuning your method. This post will present a simple method for turning data into a name generation script, two examples of it in use, some analysis and some further options.

This name method was used for the World Battle Royale described a couple of days ago, as well as a so-far still secret project. In the World Battle Royale, it was used to generate names of nations. In this project, it was used to generate names of creatures:

Slugra
Policool
Venusyu
Ampcutor
Omafable
Scidash
Tylax
Warown
Bulbbat
Snubhorn
Toriafish
Nadra
Rampgela
Jumpfree
Bulbpy
Weeloom
Donras
Sunfarig
Umbreuk
Magnite
Doaur
Skarleef
Rampvitar
Murkeon
Graveluck
Nineonaw
Weetoed

These names are pretty strange, huh? The name generation method relies on splitting up the names from a source list of data. In this case, this data was the names of the first two generations of pokémon. Removing a few outliers, like Mr. Mime or Mew, this gives 247 names. The method will then be able to create 61000 names by combining these names with each other.

That's it, really. Each pokémon name has been split into two, a first part and a latter part, and then the script just puts a random first part together with a random second part.

Slugra = Slug(ma) + (Ab)ra
Policool = Poli(whirl) + (Tenta)cool
Venusyu = Venus(aur) + (Star)yu

How do you split the names, then? I have all the names in one column in an Excel document, and then in the next column, I have written the first part manually, then with a small automated formula in the third column to find the second part.

This still leads to some questions. How do you split a name like Venusaur? Is it Venus-aur or Venu-saur? The most important thing is not actually how the original name was supposed to be understood, since the method is about generating new names. Instead, the important thing is to do this split consistently with all names, even if it does not make sense in one specific case. The first time, as seen above, I chose to go for a full FIRST syllable with both beginning and ending consonants. My second time around, I instead went for a full LAST syllable, with slightly better results.

The second time was, as mentioned, to generate country names for World Battle Royale. Again, same method. Find a list of all country names and just put them in an Excel document, and split them one by one. Thus you get:

Thaimas
Nerain
Botdesh
Lithudos
Nirus
Papaum
Thaiize
Armein
Liechtenuda
Caletan
Rwaivia
Sugovina
Pania
Zamswana
Egiazil
Chinei
Caisalem
Angogaria
Pakiina
Triniso
Nirundi
Ladia
Ivoroon
Seyda
Tude
Tokeman
Reca
Solod
Male

The amazing thing about this method is that usually, it would be impossible to find a good method. A constant interjection would be "it does not sound English enough", or, "it sounds too English". Now, using an actual list of country names, there can be no doubt that it is just right. Like Liechtenuda. Just right. 

However, splitting up the names of nations is not without issues. One problem is overspecificity. Liechten, for instance, points a bit too obviously back at Liechtenstein. Another problem is underspecificity. Nine of the 240 country names end in -land, then there's five with -nia and five with -bia, and then three each with -ria, -da, -go, -dan, -ya, -sia. Should we then choose -land as the latter half of the word as often as the others, or more often, to be more representative? 

No matter what, this does mean that we cannot just say 240 first parts and 240 second parts must give 240*240 = 57000 combinations, since several are literally the same. Even worse, several are almost the same. Like, how different is Pania from Mania, Sania, Sunia, Ninia, etc.? Should these be counted as ten different possibilities, or just one?

Of course, if we want more diversity, it does not have to end there. To make sure that there was an ever more obvious diversity in names, I added some checks and additions. For instance, Thaimas could become:
The Thaimas
Thaimas Nerain (two names with a space inbetween)
Thaimas Republic (or kingdom, empire, federation, etc.)
Republic of Thaimas (as above)
Costa Thaimas (or mont, terra, holy, etc.)
Thaimas Verde (or azul, negro, etc.)
Or a combination of these, such as Republic of The Holy Thaimas Nerain.

I find that combining neologisms (such as Thaimas or Nerain) together with well-known words that describe them creates a nice bridge between the procedural and the well-known.

Anyway, next time you're doing a procgen project and need a quick method for generating names, you should consider finding a list of a couple hundred actual names, then splitting them and putting them back together. This could probably be used in a lot of ways I would never think of in a hundred years.

Comments