理解嵌入式系统中基本的语音算法

理解嵌入式系统中基本的语音算法转至 https www embedded com print Understandth MindTreeCons February06 2006Anenormo

转至 https://www.embedded.com/print/

Speech processing mainly involves compression/decompression, recognition, conditioning and enhancement algorithms. Signal-processing algorithms count on system resources like available memory and clock. As these resources relate directly to system cost, they’re often prohibitive.

Measuring an algorithm’s complexity is the first step in analyzing the algorithm. This includes looking at the clocks required, and determining the algorithm’s processing load, which can vary based on the processor employed. However, the memory requirements would not change based on the processor.

Most DSP algorithms work on collections of samples, better known as frames (Fig. 1). This introduces an inevitable delay due to frame collection that’s in addition to the processing delay. Note that the International Telecommunication Union (ITU) standardizes the acceptable delay for an algorithm.

在这里插入图片描述

  1. Looking at the audio spectrum, basic telephone-quality speech occurs up to 4 kHz. High-quality speech reaches 7 kHz, followed by CD-quality audio.
    An algorithm’s processing load is typically represented in millions of clocks per second (MCPS), which is the number of clocks/s from the core that an algorithm would need. Assume an algorithm that processes a frame of 64 samples at 8 kHz, and requires 300,000 clocks to process each frame. The time required to collect the frame would be 64/8000, or 8 ms. Or, in 1 second, 125 frames could be collected. To process all the frames, the algorithm would consume 300,000 × 125 = 37,500,000 clocks/s, represented as 37.5 MCPS. Simplifying, the MCPS equation is:

MCPS = (clock required to execute one frame × sampling frequency/frame size)/1 million

Note that there’s another common term used for measuring an algorithm’s processing load—MIPS (million instructions/s). The calculation of MIPS for an algorithm can be tricky. If the processor effectively executes one instruction in one cycle, the MCPS and MIPS ratings for that processor are the same. Analog Devices’ BlackFin is one such processor. Otherwise, if the processor takes more than one cycle to execute an instruction, a ration exists between the MCPS and MIPS ratings. For example, ARM7TDMI processor effectively requires 1.9 cycles/instruction.

The memory considerations for any algorithm are typically separated between code (read-only) and data (read-write) memory. The proper memory amount can be found by compiling the source code. Note that algorithms perform at their best when using the fastest memory, and this is usually memory that’s internal to the core.

Integrating an algorithm on an existing system is somewhat easier. If the system is in devolvement phase, it’s recommended to test the audio front-end thoroughly before integrating of evaluating any algorithms. Within the system, you must verify that no interrupts are contradicting with each other. If such an issue were to exist, debugging can be a painful experience.

In a system that will incorporate audio/speech algorithms, robust audio firmware is a must. It must give the maximum time and accurate data to the algorithms to perform efficiently. One common mistake is to interrupt the core upon each sample’s arrival. If the algorithm operates only on the frames of a fixed number of samples, other interrupts are redundant. DMAs and internal FIFOs can collect samples and interrupt the core after collecting a frame.

Checking the signal levels, adjusting the hardware codec gains, synchronizing the near and far-end interrupts, verifying the DMA function, or any other experiment can be accomplished using this basic telephony standard. During this process, don’t be surprised to find that the received compressed data is in bit-reversed manner. A simple “bit-reversing” code will bring it back to the expected state. Any wideband speech codec could be used as an example of a speech algorithm that’s heavy in terms of memory and clock consumption. One example is sub-band ADPCM (adaptive differential PCM), or G.722, which operates on data sampled on 16 kHz and thus covers entire speech spectrum. It retains the unvoiced frequency components, those between 4 and 7 kHz that provide high-quality natural speech.

Before any codec is integrates into a system, I recommend that the designer do careful testing. While G.711 encoding and decoding can be tested on a sample-by-sample basis, codecs that involve filters and other frequency-domain algorithms are tested differently, using a stream of at least few thousand samples. The codec verification engages the engineer in unit testing with ITU vectors, signal-level testing, and interoperability testing with other available codecs. Interoperability issues related to arranging the encoded data in 16-bit word before transmitting and mismatching in signal levels aren’t new to system integration engineers.

Algorithms that require lots of memory and clock cycles have a big impact on the system, unlike those that have been discussed so far. The more compute-intensive algorithms include echo cancellers, noise suppressors, and Viterbi algorithms. Evaluating the performance of these is not as easy as the speech codecs.

Generally, any telecomm systems that involve a hands-free or speaker mode employ an acoustic echo canceller. This prevents the second party from hearing his own voice as an echo. If operated in a noisy environment, a noise-control algorithm may be needed. The echo canceller-noise reducer (EC-NR) demands lots of memory and clocks from the system. Time- and frequency-domain techniques can help solve the acoustic echo problem, with frequency-domain techniques proven to be more efficient with less computational cost (Table 1).

A frequency-domain technique uses an adaptive FIR filter to update its coefficients only when it finds that the residual echo error is larger than the threshold. Subtracting the estimated echo from the input signal gives the error. The far-end signals are used as a reference to these algorithms to estimate the echo. Providing a proper reference is needed to get good echo estimation and cancellation.

Another factor, echo tail length, is the echo reverberation time measured in milliseconds. Simply put, it’s the time spent in echo formation. The filter length is found by multiplying the echo tail length by the sampling frequency (Table 2).

在这里插入图片描述
Table 2
One of the basic requisites for an error-correction (EC) implementation is to support the data sampled until at least 16 kHz to ensure that wideband speech is covered. Integrating EC with wideband speech codecs requires some attention. As the echo tail length depends on the sampling frequency, canceling echo up to 72 ms with data sampled at 8 kHz will effectively cancel only half of the span when applied on the 16-kHz sampled data. And compared to 8 kHz, collecting a frame takes only half the time. Hence, engineers find integrating a half-effective EC with wideband codecs doubly challenging. Designers often raise the core frequency to efficiently manage EC on the system with a 16-kHz sampling rate.

Noise-reduction techniques have been used for many years. Depending on the application, an approach is chosen, implemented, and applied. For example, a technique could consider noise as more stationary then the human speech. The algorithm will model the noise and then subtract it from the input signal. A decay of 10 to 30 dB is significant for some applications. A common application that uses EC noise reduction could be when a handset is placed in speaker mode in a noisy environment or when hands-free mode is enabled in the car (Fig. 2).

The EC tail length requirement for the hands-free application is about 50 ms and the NR level required can vary from 12 to 25 dB, depending on the noise attributes and expected voice quality. Generally, the higher the noise reduction, the more the speech quality is put at risk. Hence, a level can be selected dynamically to give a reasonable reduction while still maintaining the proper voice quality.

The EC noise reduction can require up to 15 or 20 kbytes of system memory. The processing of each 64-sample frame can consume from 1.5 to 3.0 Mclocks, depending on the processor. Evaluating the performance of this combination can be tricky. The steps include tuning the hardware codec gains; finding the correct microphone and speaker placement; finding the synchronization between far and near-end speech and interrupts; finding audio hardware with linear attributes; and testing various EC tail lengths and noise reduction levels to achieve the best echo cancellation and noise reduction.

It’s important to consider worst cases when evaluating the complexity of any algorithm. An algorithm’s execution time can vary for different frames. This data dependency is due to the fact that a processor might take more time to multiply two samples of higher amplitude than multiplying samples of lesser amplitude.

An example of being cheated with adaptive algorithms comes when you observe the cycles consumed for a few frames, when the filter coefficients have not been updated. Adaptation of filter data can take several thousand cycles, which must be considered. A word of caution—don’t rely solely on the algorithm. Experimenting with a variety of vectors will help increase the accuracy of MCPS and performance measurements.

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/225741.html原文链接:https://javaforall.net

(0)
上一篇 2026年3月17日 上午8:38
下一篇 2026年3月17日 上午8:39


相关推荐

  • 数据可视化与大数据分析

    数据可视化与大数据分析 商业智能通常被理解为将企业中现有的数据转化为知识,帮助企业做出明智的业务经营决策的工具。而数据分析是商业智通的途径之一,而大数据分析的结果可视化,对经营决策将起着关键作用。 数据可视化软件可以让数据分析师和业务用户利用图表、图形传达信息,帮助读者更加直观地理解数据背后的故事。但试图基于数据可视化来讲述故事的时候,你必须小心,特别是当你长时间沉浸在分析工作中受到熟悉内容蒙蔽的时候。要确…

    2022年6月4日
    42
  • 关于java的JIT知识

    关于java的JIT知识

    2021年11月29日
    40
  • VMware Ubuntu安装详细过程(详细图解)

    说明:该篇博客是博主一字一码编写的,实属不易,请尊重原创,谢谢大家!一.下载Ubuntu镜像文件下载地址:http://mirrors.aliyun.com/ubuntu-releases/16.04/进入下载页面,如下图选择版本点击即可下载二.下载及安装VMware下载地址:https://pan.baidu.com/s/1aEEI-DRa4oKeViddxW2CPA…

    2022年4月7日
    73
  • 浅谈 &0xFF操作

    浅谈 &0xFF操作在java.io.FilterOutputStream.DataOutputStream:与机器无关地写入各种类型的数据以及String对象的二进制形式,从高位开始写。这样一来,任何机器上任何DataInputStream都能够读取它们。所有方法都以“write”开头,例如writeByte(),writeFloat()等。java.io.FilterOutputStream.PrintSt

    2022年6月19日
    33
  • ostringstream的使用方法

    ostringstream的使用方法ostringstream的使用方法【本文来自】http://www.builder.com.cn/2003/0304/83250.shtmlhttp://www.cppblog.com/alanto

    2022年7月4日
    36
  • Pytest(1)安装与入门[通俗易懂]

    Pytest(1)安装与入门[通俗易懂]pytest介绍pytest是python的一种单元测试框架,与python自带的unittest测试框架类似,但是比unittest框架使用起来更简洁,效率更高。根据pytest的官方网站介绍,它

    2022年7月30日
    7

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号