This homework is designed to extend the text normalization tutorial from Week 7. If you don't have the tutorial and perl scripts, download them from here.
Start from the tutorial and build a text normalizer with the capability to convert up to four digit numbers to words. Then, develop text normalizers for several different contexts, including:
These contexts should be mutually exclusive, so by unioning them all together you should be able to handle any of the above inputs. Submit one transducer that can handle all of these inputs; you may also want to submit the subparts for each one so that the TA can debug if need be. Also be sure to submit your vocabulary file.
Extra credit I: write another transducer to convert from the text output to the corresponding phonetic string; compose this with your text normalizer to produce the corresponding phonetic output for an input string.
Extra credit II: develop a (fictitious) library of stored concatenative segments. Show how you can use the fsm toolkit to find the best set of segments that correspond to a particular phonetic string. In essence, you should be able to go all the way from input text to the selected units in one set of cascaded transducers.
Document each file that you submit with a brief explanation of what it is; in addition, make sure to give the command that you use to run the transducer to make it easier for the TA to grade. Don't make her hunt for the answer! Failure to document appropriately will mean a reduced grade.
Write up all of your answers to the questions in a text editor so that it can be submitted electronically (txt files preferred). Put that file as well as your fsm files in a directory called hw6, and use the submit command to send the files to the grader. The syntax of the submit command is:
submit c794aa lab6 hw6
Again, make sure that your writeup includes enough instructions that we will be able to run your fsms easily. That means to tell us what files are what.
Have fun!