| User | Post |
|
9:47 am April 25, 2010
| dardack
| | |
| Member | posts 18 |
|
|
So I have TTS working for when a user joins the channel. It would be easy to add the others. Question, when the person joins it plays the tts but in mangerl it doesn't show him joining until the speech is finished. Whereas in Ventrilo it shows the person in the channel before the speech starts (i'm using mangler next to vent in wine to test). What I did was add my .h file to mangler.cpp and where the notificatoin for channel enter was played, added the function that plays user has entered the channel.
Would there be a better place to put this call at?
Also, I've never made/submitted a patch to a project before. So have no clue how to do this. I had to add 2 lines to the makefile in src, and then my .h file. Also, espeak lib would be required.
|
|
|
2:03 pm April 25, 2010
| dardack
| | |
| Member | posts 18 |
|
|
Ok I can add leave/join channel and server, but mangler always waits for espeak to finish speaking before it finishes another task. Like if I PTT my button and someone leaves/joins the channel aand I let go, my light stays lit until espeak is done speaking, but i'm not sure if i'm still transmitting.
I know TTS is a low priority. Just thought I would try something out, been like 10 years since i've programmed anything, but it's there just nto perfect.
|
|
|
2:06 pm April 25, 2010
| econnell
| | |
| Admin
| posts 319 |
|
|
Your espeak call is blocking, so nothing else can happen while it's playing audio. To do this properly, you'd need to get the PCM that espeak generates and play that through the existing sound functions in mangler such as playNotification. That will play the audio in its own thread.
|
|
|
6:47 pm April 25, 2010
| dardack
| | |
| Member | posts 18 |
|
|
Yea unfortunately it seems like espeak returns a short *wav, while the q/PCM/audio functions from mangler all take uint_8 haven't figured otu how to convert between the 2.
|
|
|
6:54 pm April 25, 2010
| econnell
| | |
| Admin
| posts 319 |
|
|
Post edited 10:57 pm – April 25, 2010 by ekilfoil
In the case of espeak and others, the wav file is probably fairly simple and not going to change. It probably doesn't do wave chunking or anything complicated like that, so wavefile_ptr + 40 should be raw PCM (which is what those functions are looking for in those uint8_t pointers)… but you'll need figure out the sample rate, which is stored somewhere in that 40 byte header. You should be able to save it to a file, load that wav file up in audacity or something, and it'll show you all the PCM params.
This may help: http://www.ringthis.com/dev/wa…..format.htm
|
|
|
7:40 pm April 25, 2010
| dardack
| | |
| Member | posts 18 |
|
|
Ok so the short *wav, is a pointer to the data. So I could pass wav offset by 40? (gonna have to break out the c books about pointers again), and in audacity the wav file has a rate of 22050,, but unsure how to convert this to a uint8_t from an int.
|
|
|
7:51 pm April 25, 2010
| econnell
| | |
| Admin
| posts 319 |
|
|
I'm doing this from memory, so I'm not sure if it's 40 or not. It may be 44? Check out that link above for a description of the header.
You need to know the following: byte order, sample rate, sample size, signed/unsigned, and the number of channels.
So you've got 22kHz, little endian (which would need to be converted to native endian (don't worry about this, if your patch is good enough, we'll handle that))…
You need to figure out bits per sample (we use 16 bit internally) and whether or not it's mono or stereo. I'd assume mono and I think that's a pretty safe assumption. Lastly, you need to know whether or not it's signed.
I would guess the following: 22kHz signed 16bit little-endian mono
Take a look at libventrilo3/codec-test/ for some programs that will play raw PCM data
|
|
|
8:19 pm April 25, 2010
| dardack
| | |
| Member | posts 18 |
|
|
I know it's mono, audacity also says the sample format is 32bit float.
That link shows 44 before the data. OK so we have sample rate, sample size, mono. According to link can also find:
|
|
|
|
| bytes/sec |
4 bytes – DWORD |
Bytes/Second |
| Block alignment |
2 bytes – WORD |
Block alignment |
| Bits/sample |
2 bytes – WORD |
Bits/Sample |
which I'm not sure which is needed.
basically espeak gives a callback if not using it with the espeak synthesizer (which I was using and causing it not to release back to mangler, so there was always a delay).
int synthCallback(short *wav, int numsamples, espeak_EVENT *events)
so wav is a pointer to the data, with the 44 byte header, than the raw audio.
What I was trying first was:
ManglerPCM *testspeak_MPCM = new ManglerPCM(?a, ?b);
ManglerAudio *testSpeaknotify = new ManglerAudio(AUDIO_NOTIFY, 22050, 1, 0, 0, false);
testSpeaknotify->queue(?c, ?d);
testSpeaknotify->finish();
Trying to figure out what to put in ?a/b/c/d is where I'm stuck at. Unless I shouldn't be using mangler's notify system.
|
|
|
8:30 pm April 25, 2010
| econnell
| | |
| Admin
| posts 319 |
|
|
Post edited 12:37 am – April 26, 2010 by ekilfoil
We discussed on IRC and decided that Audacity is reporting the 32bit float wrong (it's an odd format for normal output).
You may want to just join on irc.freenode.net #mangler
Edit: oh… bits/sample is what you need
|
|
|
10:04 am April 27, 2010
| dardack
| | |
| Member | posts 18 |
|
|
Ok I was wrong, espeak gives back raw audio data, no header, 16 bit mono, 22050 Hz. So I get a short pointer to this data, and numsamples: is the number of entries in wav. This number may vary, may be less than the value implied by the buflength parameter given in espeak_Initialize, and may sometimes be zero (which does NOT indicate end of synthesis).
So I still need length to pass to ManglerPCM correct?
so basically, in mangler.cpp when a user joins a channel i call, ttspeak_UserJoinChan(name, phonetic);
In that function, after epeak is initialized and synth'd, it generates a callback (instead of playing), this callback calls the callback function:
static int callback (short *wav, int numsamples, espeak_EVENT *events)
{
}
events items which indicate word and sentence events, and also the occurance if <mark> and <audio> elements within the text. The list of events is terminated by an event of type = 0. Dont' care right now about them, cause basically they indicate word/sentence/char, and I know i'm passing sentences at this time.
But MangePCM takes uint32_t length, uint8_t *sample. So not sure how to send this audio sample to Mangler.
|
|
|
10:06 pm April 27, 2010
| Haxar
| | |
| Moderator
| posts 58 |
|
|
A primitive implementation of eSpeak is now in trunk (r781). Usage examples can be found in 'mangler.cpp' under "case V3_EVENT_TEXT_TO_SPEECH_MESSAGE" and "case V3_EVENT_USER_PAGE".
|
|
|
12:24 am April 28, 2010
| dardack
| | |
| Member | posts 18 |
|
|
Ahh nice, this is the part was messing me up:
numsamples * sizeof(short), (uint8_t *)wav, for some reason i thought the audio data had header info. Was informed this was wrong.
Oh well, otherwise the ESPEAK part looks close to mine, except i separated it didn't put it into mangleraudio.cpp, very nice.
Doesn't seem too primitive, I added for join/leave channel:
// they're joining our channel
#ifdef HAVE_ESPEAK
if (strlen(u->phonetic) == 0) {
audioControl->playText(c_to_ustring(u->name) + " has joined the channel.");
}
else {
audioControl->playText(c_to_ustring(u->phonetic) + " has joined the channel.");
}
#else
audioControl->playNotification("channelenter");
#endif
same for leave. Works ok. Would just have to add a GUI option to turn on notifications in TTS or not.
|
|