Jump to content

Researching possible text to speech plugin


utopium

Recommended Posts

A couple days ago the thought occurred to me that a text to speech plugin for Fallout 4 might be feasible. The inspiration for this comes from the Fuz Ro D-oh SKSE plugin for Skyrim that will display subtitles for dialogue from mods that do not have associated voices for them. I started doing some research on the possibility of this and thought I would post here to get peoples thoughts on the matter and see if I could find more information. For the record I personally don't have any experience at all with modding Fallout or Skyrim, but I do have a lot of experience in software development and audio programming so keep this in mind with what I describe below. This might be a bit lengthy so bear with me.

The general idea would be similar to Fuz Ro D-oh in that any dialogue a mod author does not have an associated voice for would have a voice generated by the Windows Speech API on the fly. I have seen a few mods where the authors used voices generated by a TTS engine, but these were pregenerated and required a lot of storage space for the audio and any changes in dialogue would require tedious regeneration of these audio assets. With an on-the-fly plugin handling this then the only thing the mod author would have to package are text assets which are very easy to edit. Now of course a TTS engine doesn't sound very natural compared to real voice actors, but in the world of Fallout this works just great for robot and synth characters. This can of course still be applied to human characters as well when mod authors can't provide real voices but still want something as a placeholder until they can get the real thing.

As an extension of this (assuming the TTS part works in the first place) this could eventually operate as a repository for common voice assets that multiple mods could access such as for followers and general background chatter. So voice actors could provide simple phrases such as "Hello" and "What do you need?" that could be recognized and used instead of being passed through the TTS engine. I figure this could lead to a plethora of follower mods such as what you see in Skyrim. Individual mods would not need to make copies of the same voice assets that the modding community could decide to share.

Now on to the technical side of things and how this could potentially be implemented. In terms of the actual text to speech engine this can be easily performed by the Windows Speech API. It is quite easy to use and I have made a quick demo program in just five minutes to see how it works. The documentation can be found at the following:

https://msdn.microsoft.com/en-us/library/ms720149(v=vs.85).aspx

This API provides control over things such as the gender of the voice, the pitch, speed, volume, etc using a simple XML format:

https://msdn.microsoft.com/en-us/library/ms717077(v=vs.85).aspx

Now the downside I see is the limitation of available voices that come with Windows. I have a Windows 7 desktop that only has a single female voice and a Windows 10 desktop that has a single male and single female voice. It doesn't look like there is much for free alternatives for voices and it costs a bit of money to get anything in addition to what Windows has. Something is better than nothing though I guess.

In terms of playing the audio generated from the TTS engine there are a couple of options. The more preferential option is to playing audio directly within Fallout. Based on my research of how Fallout works it appears that the user interface portions use ActionScript 3. Now there isn't any need to actually display any user interface, but the Sound class for ActionScript 3 can be used to play from an audio stream:

http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/media/Sound.html

In order to pass the audio stream from the C++ plugin to use either the NativeProcess class which can stream data from an external process through STDIN/STDOUT or use the Socket class to do this over a local network connection:

http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/desktop/NativeProcess.html
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/Socket.html

The gotcha in using any of these ActionScript classes in the following:

"In Flash Player 10 and later and AIR 1.5 and later, you can also use this class to work with sound that is generated dynamically."

Anyone happen to know what the limitations of the ActionScript API are in Fallout? It definitely isn't Flash Player or AIR so whether or not this is supported is unknown to me. Doesn't look like audio can be injected into the game without this. The NativeProcess and Socket class also have documentation that suggest they may not work everywhere.

If the ActionScript route isn't possible, audio can still be played directly through Windows. Not the most ideal solution since it means in game audio effects won't be applied to the TTS stream but at least audio can be played. In order to make it still sound reasonable the plugin would need to get the volume settings so that the TTS audio isn't too loud or too quiet . Now in SKSE there is a function named GetINISetting in GameSettings.cpp which can be used to grab the volume settings, but looking at the source code for F4SE it appears that this has not been implemented yet. Given that F4SE still has a bunch of other things to implement as well any implementation of a TTS plugin will have to wait until there is more to work with there.

The part that I have no idea about which hopefully other people can chime in and explain is hooking in to Fallout so that a callback function in the plugin gets executed for all dialogue to determine whether or not it has an actual voice recording of it or if it is text only. Being able to obtain other information such as the gender of the actor that is speaking and such would be a nice bonus. Fuz Ro D-oh was definitely capable of this in Skyrim, but I do realize that the dialogue system in Fallout is a little bit different plus F4SE is still in early stages so I wouldn't be surprised if this is not possible, at least not at the moment. Without this mechanism though everything else I described above would not matter.

Anyway, those are all the thoughts I have. Sorry if this post seems a little bit long but I thought I would just throw out everything I have come up with so far to get the best feedback possible. Before I get too deep into trying to implement something I figured it would be good to know what people with more modding experience think. Truth be told I may not even find time to do this, but as long as this is possible and the idea is out there then maybe someone else could do it if I can't.

Possible? Not possible? The worst idea ever?

Link to comment
Share on other sites

I have a feeling this may be possible but Im not a professional programmer, just a very stubborn modder. There are a few possible avenues you might be able to take, none of which I know everything about.

 

F4SE Plugins

I know the least about this subject but from previous titles you should be able to round trip data to and from papyrus to be processed by your custom F4SE plugin. I dont know about any kind of securities the game may have for interacting with windows APIs like TTS but it sounds promising. It seems from my point of view that your F4SE plugin can do at least whatever the game already can depending on what is already decoded or whatever your willing to decode.

 

Actionscript 3

We already have the ability to communicate data to and from Papyrus and AS3 using Holotape programs. Note that the game uses a custom implementation of AS3 using Scaleform. That means the AS3 API is not available in its entirety and some parts may function differently than stated on the AS3 Reference. But practically most things really do work the same. If you would like to poke around some of the Fallout 4 AS3 classes you can browse my secret testing branch on GitHub https://github.com/Scrivener07/FO4_Interface/tree/FO4_1.6.3.0_Beta_1208831_EN/Data/Interface/Source/scripts It is not complete, most notably missing the UI programs and their dependencies.

Some of the API classes are located in namespaces like Shared, fl, scaleform, and others.

 

Python

I know ZERO about this one but I also noticed you can execute a Python program instead of a SWF file from a Holotape. Im sure some neat shenanigans can be made of that :)

Link to comment
Share on other sites

I have a feeling this may be possible but Im not a professional programmer, just a very stubborn modder. There are a few possible avenues you might be able to take, none of which I know everything about.

 

F4SE Plugins

I know the least about this subject but from previous titles you should be able to round trip data to and from papyrus to be processed by your custom F4SE plugin. I dont know about any kind of securities the game may have for interacting with windows APIs like TTS but it sounds promising. It seems from my point of view that your F4SE plugin can do at least whatever the game already can depending on what is already decoded or whatever your willing to decode.

 

Actionscript 3

We already have the ability to communicate data to and from Papyrus and AS3 using Holotape programs. Note that the game uses a custom implementation of AS3 using Scaleform. That means the AS3 API is not available in its entirety and some parts may function differently than stated on the AS3 Reference. But practically most things really do work the same. If you would like to poke around some of the Fallout 4 AS3 classes you can browse my secret testing branch on GitHub https://github.com/Scrivener07/FO4_Interface/tree/FO4_1.6.3.0_Beta_1208831_EN/Data/Interface/Source/scripts It is not complete, most notably missing the UI programs and their dependencies.

Some of the API classes are located in namespaces like Shared, fl, scaleform, and others.

 

Python

I know ZERO about this one but I also noticed you can execute a Python program instead of a SWF file from a Holotape. Im sure some neat shenanigans can be made of that :smile:

The Python bit is not actually in the game engine at all. I checked a few months ago and they ripped out the Python stuff from the Creation Engine.

Link to comment
Share on other sites

Tangentially related, but I've always envisioned taking voices from the Japanese version of the game with English subtitles for new lines. Text-to-speech-generated voices would be a complete immersion-breaker for me, especially if everybody sounded the same. All lines (and actors) in the English version have corresponding Japanese lines (and actors), so you'd have a huge variety of lines and emotions to pick from already. Of course it'll sound completely wrong to people who actually understand Japanese, but for everybody else it'll just be like someone watching anime with English subs. Silly? Perhaps. Just a thought that I've had for a while! :tongue: (companions speaking jap)

Anyway, as scrivener07 has noted, Fallout 4 uses Scaleform for UI. A list of the AS3 API available to you in Scaleform applications is (or rather, was) published by Autodesk. (Unfortunately, they've since hidden the document behind a paywall.. - the good news is that the document has been archived by the Internet Archive.)

sf_4.3_flash_support.pdf

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...