Generating Alien Words
Oct. 3rd, 2003 11:45 pmHowever, I think the limitations of both the Prolog and the CLIPS (at least for a naive user) is that it could either tell me if a given string fits the rules, or generate all possible legal morphemes. And neither of those get me what I want. However, I guess, if the system actually managed not to choke up when generating all those random syllables, one could output them to a file, and then use another program to grab random ones. I may have to make do with that.
Now, a savvy person might wonder why I'm not using the language generator available from http://www.langmaker.com, to which the answer is twofold:
- it is written in Visual Basic, and thus won't run on Linux
- even if it did run on Linux, it doesn't quite give me what I want
The langmaker program is quite nifty in a number of ways; if you start out with a base language (and they provide data for a few) then it's good at mutating the language into another form, simulating sound-changes over years. Mind, I wrote myself a perl script that did basically the same thing...
Unfortunately, if I recall correctly, the facility for generating random words from patterns doesn't allow quite the rule-based idea I would like to employ. One can do simple things like CVC (consonant, vowel, consonant) when one defines what C and V are... but I'd like to do something more complicated, based on more fine-grained categorization than simply consonants and vowels. And while I suppose one could painstakingly split out the consonants and vowels into the different categories, it would still be rather hard to say, for example, that one pattern for a morpheme is a high voiced consonant followed by a high vowel -- that is, rules based on testing multiple categories.
So I'm going to try to do it myself. I hope I manage to get it set up so that it isn't that hard to do. The basic design is that one has one file full of linguistic info about sounds, such as consonant/vowel, position in mouth (high/low/mid/front/centre/back), voiced/unvoiced and so on. Then in a separate file for each language, one defines (a) which of the possible sounds are counted as phonemes in this language (for example English doesn't have the kh sound that you get in the German "achtung"), and also a set of rules which define legal combinations -- which could get as fussy as you like. For example, in English, one can end a word with "ng" but one can't start a word with it.
Then one plugs it in and hopefully generates "words" that "sound right", even if you don't have any actual meanings to attach to them. That's enough to get you names, and often names are about the only thing one needs, just to get the "feel" of an alien/fantasy/extinct language (depending on what sort of fiction it is one is writing).
On the other hand, I could just get swamped by something urgent before I finished typing everything in. (sigh)
And so much for my plans to get to bed early (sigh)