librosa | 系统实战（五~十七）

文章目录

[ （一 ~ 四）librosa学习点此处](https://blog.csdn.net/_/article/details/)
五、频谱特性 Spectral representations
- （1）短时傅里叶变换 short-time Fourier Transform（STFT）
- （2）短时傅里叶逆变换（ISTFT）
- （3）瞬时频率 ifgram()
- （4）音乐中常用的CQT算法（constant-Q transform）
- （5）icqt()
- （6）hybrid_cqt()
- （7）pseudo_cqt()
- （8）快速梅林变换（fmt）
- （8）计算时频信号中谐波的能量 interp_harmonics()
- （9）谐波显示功能 salience()
- （10）相位声码 phase_vocoder()
- （11）相位幅值 magphase()
- （12）使用IIR滤波器的时频表示 iirt()
六、幅度 Magnitude scaling
- （1）amplitude_to_db()
- （2）db_to_amplitude()
- （3）power_to_db()
- （4）db_to_power(S_db, ref=1.0)
- （5）perceptual_weighting()
- （6）A_weighting()
- （7）pcen()
七、时频转化 Time and frequency conversion
- （1）frames_to_samples()
- （2）frames_to_time()
- （3）samples_to_frames()
- （4）samples_to_time()
- （5）time_to_frames()
- （6）time_to_samples()
- （7）hz_to_note()
- （8）hz_to_midi()
- （9）midi_to_hz()
- （10）midi_to_note()
- （11）note_to_hz()
- （12）note_to_midi()
- （13）hz_to_mel()
- （14）hz_to_octs()
- （15）mel_to_hz()
- （16）octs_to_hz()
- （17）fft_frequencies()
- （18）cqt_frequencies()
- （19）mel_frequencies()
- （20）tempo_frequencies()
- （21）samples_like()
- （22）times_like()
八、librosa.effects
- （1）librosa.effects.split
- （2）librosa.effects.hpss(y)
九、librosa.filters
- Mel滤波器组
十、librosa.onset
十一、librosa.segment
十二、librosa.sequence
十三、librosa.util
- （1）librosa.util.frame()
- （2）librosa.util.pad_center()
- （3）librosa.util.fix_length()
- （4）librosa.util.fix_frames()
- （5）librosa.util.index_to_slice()
- （6）librosa.util.softmask()
- （7）librosa.util.sync()
- （8）librosa.util.axis_sort()
- （9）librosa.util.normalize()
- （9）librosa.util.roll_sparse()
- （10）librosa.util.sparsify_rows()
- （11）librosa.util.buf_to_float()
- （12）librosa.util.tiny()
- （9）动态范围压缩Dynamic range compression（DRC）
十四、Deprecated(moved)
- （1）dtw() 动态时间扭曲
- （2）fill_off_diagonal()
十五、Rhythm features
- （1）tempogram()
十六、Feature manipulation
- （1）delta()
- （2）stack_memory()
十七、Spectrogram decomposition
- （1）librosa.decompose.decompose() 分解一个特征矩阵
- （2）librosa.decompose.hpss()
- （3）librosa.decompose.nn_filter() 谱分解
Matching
- match_intervals() 将一组时间间隔与另一组时间间隔匹配。
- match_events() 将一组事件与另一组事件匹配。
Miscellaneous
- localmax() 在数组x中找到局部最大值。
- peak_pick() 使用灵活的启发式算法选择信号中的峰值。
Input Validation
- valid_audio() 验证变量是否包含有效的单声道音频数据。
- valid_int() 确保输入值是整型的。
- valid_intervals() 确保数组是时间间隔的有效表示。
File operations
- example_audio_file() 获取包含音频示例文件的路径。
- find_files() 获取目录或目录子树中已排序的(音频)文件列表。
、magphase

（一 ~ 四）librosa学习点此处

五、频谱特性 Spectral representations

（1）短时傅里叶变换 short-time Fourier Transform（STFT）

torch.stft()与librosa.stft()的对比

librosa.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='reflect')

y：音频时间序列
n_fft：FFT窗口大小，n_fft=hop_length+overlapping
hop_length：帧移。
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft))，未指定hop_length时，则默认win_length / 4
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft, hop_length=len(frame)))时，如果帧移长度小于傅里叶变换点数，librosa.stft输出为hop_length+1
spectrum = np.abs(librosa.stft(frame, n_fft=self.nfft, hop_length=self.nfft))时，无论win_length设置为帧长还是nfft，librosa.stft输出都只有一帧。
最后得出结论librosa.stft的输出帧数为speech_length // hop_length + 1
win_length：每一帧音频都由window()加窗。窗长win_length，然后用零填充以匹配n_fft。
默认win_length=n_fft。
window：字符串，元组，数字，函数 shape =（n_fft, )
窗口（字符串，元组或数字）
窗函数，例如scipy.signal.hanning
长度为n_fft的向量或数组
center：bool
如果为True，则填充信号y，以使帧 D [:, t]以y [t * hop_length]为中心
如果为False，则D [:, t]从y [t * hop_length]开始
dtype：D的复数值类型。默认值为64-bit complex复数
pad_mode：如果center = True，则在信号的边缘使用填充模式。默认情况下，STFT使用reflection padding

代码：

#利用STFT将声音信号转换为时频信号: import librosa.display import matplotlib.pyplot as plt import numpy as np # 声音文件路径 audio_path = 'D:/My life/music/some music/sweeter.mp3' # 加载音频 x, sr = librosa.load(audio_path, sr=None, offset=0) # sr置为None即采用原采样率，若不指定则采用默认的22.05khz # 对声音信号做STFT转为时频信号 X = librosa.stft(x) # 将时频信号中的实部ndarray也存为npy文件，以供在ISTFT时使用 np.save("D:/My life/music/some music/real.npy", np.real(X)) # 将时频信号中的虚部ndarray也存为npy文件，以供在ISTFT时使用 np.save("D:/My life/music/some music/imag.npy", np.imag(X)) # 将振幅转为db Xdb = librosa.amplitude_to_db(np.abs(X)) print(Xdb.shape) # 将db的ndarray存起来以供训练时使用 np.save("D:/My life/music/some music/test.npy", Xdb) # 画出时频图 librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz') # 添加颜色条 plt.colorbar() # 限制y轴大小 plt.ylim(19800, 20200) # 显示画布 plt.show()

#librosa语音库当中的STFT代码阅读（含注释）

def stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=np.complex64, pad_mode='reflect'): Use left-aligned frames, instead of centered frames >>> D_left = np.abs(librosa.stft(y, center=False)) Use a shorter hop length >>> D_short = np.abs(librosa.stft(y, hop_length=64)) Display a spectrogram >>> import matplotlib.pyplot as plt >>> librosa.display.specshow(librosa.amplitude_to_db(D, ... ref=np.max), ... y_axis='log', x_axis='time') >>> plt.title('Power spectrogram') >>> plt.colorbar(format='%+2.0f dB') >>> plt.tight_layout() # By default, use the entire frame if win_length is None: win_length = n_fft # Set the default hop, if it's not already specified if hop_length is None: hop_length = int(win_length // 4) fft_window = get_window(window, win_length, fftbins=True) # Pad the window out to n_fft size 将窗口大小扩展到与N_FFT大小相同 fft_window = util.pad_center(fft_window, n_fft) # Reshape so that the window can be broadcast fft_window = fft_window.reshape((-1, 1)) # Check audio is valid util.valid_audio(y) # Pad the time series so that frames are centered if center: y = np.pad(y, int(n_fft // 2), mode=pad_mode) # 在y的两侧，分别镜像填充n_fft//2个数据 # eg:[3, 2, 1, 2, 3, 4, 5, 4, 3]（填充两个数据） # Window the time series.将信号进行分帧 y_frames = util.frame(y, frame_length=n_fft, hop_length=hop_length) # Pre-allocate the STFT matrix 对输出矩阵进行内存分配，应该有助于计算速度的提升 stft_matrix = np.empty((int(1 + n_fft // 2), y_frames.shape[1]), dtype=dtype, order='F') # how many columns can we fit within MAX_MEM_BLOCK? 在librosa当中，设定为256KB # 计算在最大内存的限制下，最多能够存储多少帧（列）数据的FFT变换 n_columns = int(util.MAX_MEM_BLOCK / (stft_matrix.shape[0] * stft_matrix.itemsize)) for bl_s in range(0, stft_matrix.shape[1], n_columns): bl_t = min(bl_s + n_columns, stft_matrix.shape[1]) # 当n_columns大于分帧数的时候，for循环当中就只有一次bl_s=0 # 反之，每次进行n_columns帧数据的运算，然后进行循环拼接 stft_matrix[:, bl_s:bl_t] = fft.fft(fft_window * y_frames[:, bl_s:bl_t], axis=0)[:stft_matrix.shape[0]] return stft_matrix

在这里插入图片描述
此图片参考资料

（2）短时傅里叶逆变换（ISTFT）

librosa.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, length=None)

stft_matrix：经过STFT之后的矩阵
hop_length：帧移，默认为winlength4
win_length：窗长，默认为n_fft
window：字符串，元组，数字，函数或shape = (n_fft, )
窗口（字符串，元组或数字）
窗函数，例如scipy.signal.hanning
长度为n_fft的向量或数组
center：bool
如果为True，则假定D具有居中的帧
如果False，则假定D具有左对齐的帧
length：如果提供，则输出y为零填充或剪裁为精确长度音频

y：时域信号

应用：

import librosa.display import matplotlib.pyplot as plt import numpy as np # 加载npy文件 Xreal = np.load('D:/My life/music/some music/real.npy') Ximag = np.load('D:/My life/music/some music/imag.npy') result = 1j * Ximag result += Xreal # 通过ISTFT转为声音信号 Y = librosa.istft(result) # 画出原始声音波形图 y, sr = librosa.load("D:/My life/music/some music/EG DT.wav", sr=None) librosa.display.waveplot(y, sr) plt.title("audio_raw") plt.show() # 画出ISTFT声音波形图 librosa.display.waveplot(Y) plt.title("audio after ISTFT") plt.show() import soundfile # 将声音信号输出到wav文件 soundfile.write('D:/My life/music/some music/EG DT_istft.wav', Y, sr=16000)

在这里插入图片描述

（3）瞬时频率 ifgram()

计算得到的瞬时频率(作为采样率的比例)作为复谱相位的时间导数。对音频信号的处理可以通过 librosa.ifgram 方法获取 stft 短时傅立叶变换的矩阵，对该矩阵进行修改搬移，再进行 istft 逆转换获得处理后的音频信号。

参数为：

norm：STFT归一化
ref_power：最小化阈值估计瞬时频率

返回值：

if_gram：瞬时频率
D：短时傅里叶变化

应用：

y, sr = librosa.load(path) frequencies, D = librosa.ifgram(y, sr=sr) y = librosa.istft(D) D为stft变换的矩阵，x 轴为时间序列，y轴为频率序列坐标对应frequencies，值为幅度。 由于D类型为numpy.ndarray，所以可以通过numpy库对矩阵处理。

（4）音乐中常用的CQT算法（constant-Q transform）

计算音频的常数Q变化的值，常数Q转换（ConstantQtransform）与短时距傅立叶转换一样为重要时频分析工具，其中特别适用于音乐信号的分析，这个转换产生的频谱最大的特色是在频率轴为对数标度（logscale）而不是线性标度（linearscale），且窗口长度（windowlength）会随着频率而改变。

librasa.cqt(fmin, n_bins, bins_per_octave, tuning)

参数为：

fmin：最小频率
n_bins：从最小频率开始，频率窗的数
bins_per_octave：每倍频程的bin数量
tuning：调整bin
…

CQT = librosa.amplitude_to_db(librosa.cqt(y, sr = 16000 ), ref = np. max ) plt.subplot( 4 , 2 , 3 ) librosa.display.specshow(CQT, y_axis = 'cqt_note' ) plt.colorbar( format = '%+2.0f dB' ) plt.title( 'Constant-Q power spectrogram (note)' )

（5）icqt()

常数Q逆变换

（6）hybrid_cqt()

混合CQT变换

（7）pseudo_cqt()

计算音频信号的伪常量- q变换。

（8）快速梅林变换（fmt）

fmt(y, t_min=0.5, n_fmt=None, kind='cubic', beta=0.5, over_sample=1, axis=-1)

参数：

y: np.ndarray, real-valued。输入信号，可以是多维的。The target axis must contain at least 3 samples.
t_min: float > 0
The minimum time spacing (in samples).
This value should generally be less than 1 to preserve as much information as
possible.
n_fmt: int > 2 or None
The number of scale transform bins to use.
If None, then n_bins = over_sample * ceil(n * log((n-1)/t_min)) is taken, where n = y.shape[axis]
kind: str
The type of interpolation to use when re-sampling the input.
See scipy.interpolate.interp1d for possible values.
Note that the default is to use high-precision (cubic) interpolation.
This can be slow in practice; if speed is preferred over accuracy,
then consider using kind='linear'.
beta: float
The Mellin parameter. beta=0.5 provides the scale transform.
over_sample: float >= 1
Over-sampling factor for exponential resampling.
axis: int
The axis along which to transform y

ParameterError：

if n_fmt < 2 or t_min <= 0
or if y is not finite
or if y.shape[axis] < 3.

Notes：

This function caches at level 30.

应用：

# Generate a signal and time-stretch it (with energy normalization) import numpy as np scale = 1.25 freq = 3.0 x1 = np.linspace(0, 1, num=1024, endpoint=False) x2 = np.linspace(0, 1, num=scale * len(x1), endpoint=False) y1 = np.sin(2 * np.pi * freq * x1) y2 = np.sin(2 * np.pi * freq * x2) / np.sqrt(scale) # Verify that the two signals have the same energy np.sum(np.abs(y1)2), np.sum(np.abs(y2)2) #(255.997, 255.969) scale1 = librosa.fmt(y1, n_fmt=512) scale2 = librosa.fmt(y2, n_fmt=512) # And plot the results import matplotlib.pyplot as plt plt.figure(figsize=(8, 4)) plt.subplot(1, 2, 1) plt.plot(y1, label='Original') plt.plot(y2, linestyle='--', label='Stretched') plt.xlabel('time (samples)') plt.title('Input signals') plt.legend(frameon=True) plt.axis('tight') plt.subplot(1, 2, 2) plt.semilogy(np.abs(scale1), label='Original') plt.semilogy(np.abs(scale2), linestyle='--', label='Stretched') plt.xlabel('scale coefficients') plt.title('Scale transform magnitude') plt.legend(frameon=True) plt.axis('tight') plt.tight_layout() # Plot the scale transform of an onset strength autocorrelation y, sr = librosa.load(librosa.util.example_audio_file(), offset=10.0, duration=30.0) odf = librosa.onset.onset_strength(y=y, sr=sr) # Auto-correlate with up to 10 seconds lag odf_ac = librosa.autocorrelate(odf, max_size=10 * sr // 512) # 标准化 odf_ac = librosa.util.normalize(odf_ac, norm=np.inf) # Compute the scale transform odf_ac_scale = librosa.fmt(librosa.util.normalize(odf_ac), n_fmt=512) # Plot the results plt.figure() plt.subplot(3, 1, 1) plt.plot(odf, label='Onset strength') plt.axis('tight') plt.xlabel('Time (frames)') plt.xticks([]) plt.legend(frameon=True) plt.subplot(3, 1, 2) plt.plot(odf_ac, label='Onset autocorrelation') plt.axis('tight') plt.xlabel('Lag (frames)') plt.xticks([]) plt.legend(frameon=True) plt.subplot(3, 1, 3) plt.semilogy(np.abs(odf_ac_scale), label='Scale transform magnitude') plt.axis('tight') plt.xlabel('scale coefficients') plt.legend(frameon=True) plt.tight_layout()

（8）计算时频信号中谐波的能量 interp_harmonics()

（9）谐波显示功能 salience()

（10）相位声码 phase_vocoder()

给定一个STFT矩阵D，将速度提高一个因子

（11）相位幅值 magphase()

计算复数图谱的幅度值和相位值。

（12）使用IIR滤波器的时频表示 iirt()

六、幅度 Magnitude scaling

（1）amplitude_to_db()

librosa.amplitude_to_db(S, ref=1.0, amin=1e-5, top_db=80.0)

S：输入幅度
ref：参考值，振幅abs（S）相对于ref进行缩放，20∗log10(Sref)

amin: float > 0 [scalar]
minimum threshold for S and ref
top_db: float >= 0 [scalar]
threshold the output at top_db below the peak:
max(20 * log10(S)) - top_db

dB为单位的S

（2）db_to_amplitude()

将db谱图转为普通振幅谱图（ db_to_amplitude(S_db) ~= 10.0(0.5 * (S_db + log10(ref)/10))）：

librosa.db_to_amplitude(S_db, ref=1.0)

参数：

S_db: np.ndarray。dB-scaled spectrogram
ref: number > 0。Optional reference power.

librosa.core.power_to_db(S, ref=1.0, amin=1e-10, top_db=80.0, ref_power=Deprecated())

S：输入功率
ref ：参考值，振幅abs(S)相对于ref进行缩放，10∗log10(Sref)
amin : float > 0 [scalar]。minimum threshold for abs(S) and ref
top_db: float >= 0 [scalar]。
threshold the output at top_db below the peak:max(10 * log10(S)) - top_db
ref_power: scalar or callable

S_dB：将功率谱（幅度平方）转换为分贝（dB）单位

应用：

import librosa.display import numpy as np import matplotlib.pyplot as plt y, sr=librosa.load('D:/My life/music/some music/sweeter.mp3') S=np.abs(librosa.stft(y)) print(librosa.power_to_db(S 2)) plt.figure() plt.subplot(2, 1, 1) librosa.display.specshow(S 2, sr=sr, y_axis='log') #从波形获取功率谱图 plt.colorbar() plt.title('Power spectrogram') plt.subplot(2, 1, 2)#相对于峰值功率计算dB, 那么其他的dB都是负的，注意看后边cmp值 librosa.display.specshow(librosa.power_to_db(S 2, ref=np.max), sr=sr, y_axis='log', x_axis='time') plt.colorbar(format='%+2.0f dB') plt.title('Log-Power spectrogram') plt.set_cmap("autumn") plt.tight_layout() plt.show()

在这里插入图片描述

（4）db_to_power(S_db, ref=1.0)

参数：

S_db: np.ndarray。dB-scaled spectrogram
ref: number > 0。Reference power: output will be scaled by this value

S: np.ndarray
Power spectrogram
ref * np.power(10.0, 0.1 * S_db）

（5）perceptual_weighting()

功率谱图的感知加权（ S_p[f] = A_weighting(f) + 10*log(S[f] / ref)）:

librosa.perceptual_weighting(S, frequencies, kwargs)

参数：

S: np.ndarray [shape=(d, t)]。Power spectrogram
frequencies: np.ndarray [shape=(d,)]。每行S的中心频率
kwargs: additional keyword arguments

S_p : np.ndarray [shape=(d, t)]

应用：

#Re-weight a CQT power spectrum, using peak power as reference y, sr = librosa.load('D:/My life/music/some music/sweeter.mp3') CQT = librosa.cqt(y, sr=sr, fmin=librosa.note_to_hz('A1')) freqs = librosa.cqt_frequencies(CQT.shape[0], fmin=librosa.note_to_hz('A1')) perceptual_CQT = librosa.perceptual_weighting(CQT2, freqs, ref=np.max) perceptual_CQT

在这里插入图片描述

import matplotlib.pyplot as plt plt.figure() plt.subplot(2, 1, 1) librosa.display.specshow(librosa.amplitude_to_db(CQT, ref=np.max), fmin=librosa.note_to_hz('A1'), y_axis='cqt_hz') plt.title('Log CQT power') plt.colorbar(format='%+2.0f dB') plt.subplot(2, 1, 2) librosa.display.specshow(perceptual_CQT, y_axis='cqt_hz', fmin=librosa.note_to_hz('A1'), x_axis='time') plt.title('Perceptually weighted log CQT') plt.colorbar(format='%+2.0f dB') plt.tight_layout()

在这里插入图片描述

（6）A_weighting()

计算一组频率的a加权。

（7）pcen()

该函数通过自动增益控制对时频表示S进行归一化，然后进行非线性压缩。

七、时频转化 Time and frequency conversion

（1）frames_to_samples()

将帧索引转换为音频样本索引。

（2）frames_to_time()

将帧数转换为时间(秒)。

（3）samples_to_frames()

将样本索引转换为STFT帧。

（4）samples_to_time()

将STFT帧转换为样本索引。

（5）time_to_frames()

将时间戳转换为STFT帧

（6）time_to_samples()

将时间戳（以秒为单位）转换为样本索引。

（7）hz_to_note()

将一个或多个频率（以Hz为单位）转换为最近的音符名称。

（8）hz_to_midi()

获取给定频率的MIDI音符编号

（9）midi_to_hz()

获取MIDI音符的频率（Hz）

（10）midi_to_note()

将一个或多个MIDI数转换为音符串。

（11）note_to_hz()

将一个或多个音符名称转换为频率（Hz）

（12）note_to_midi()

将一个或多个拼写音符转换为MIDI数字。

（13）hz_to_mel()

将Hz转换为Mels

（14）hz_to_octs()

将频率（Hz）转换为（分数）倍频程数。

（15）mel_to_hz()

将mel频率转换为频率

（16）octs_to_hz()

将八度数转换为频率。

（17）fft_frequencies()

np.fft.fftfreq的替代实现

（18）cqt_frequencies()

计算Constant-Q箱的中心频率。

（19）mel_frequencies()

计算调整到梅尔音阶的声学频率阵列。

（20）tempo_frequencies()

计算对应于起始自相关或临时图矩阵的频率（以每分钟节拍数为单位）。

（21）samples_like()

返回一组样本索引以匹配特征矩阵中的时间轴。

（22）times_like()

返回一组时间值以匹配特征矩阵中的时间轴。

八、librosa.effects

时域音频处理,如音高移动和时间拉伸。这个子模块还为分解子模块提供时域包装器。

（1）librosa.effects.split

librosa.effects.split(y, top_db=60, ref= 
   
     , frame_length=2048, hop_length=512) 将音频信号分成非静音间隔。

参数：

y：np.ndarray，shape =（n，）或（2，n）音频信号
top_db：数字> 0 低于参考值的阈值（以分贝为单位）被视为静音
ref：参考功率。默认情况下，它使用np.max并与信号中的峰值功率进行比较。
frame_length：int> 0 每帧的样本数
hop_length：int> 0 帧之间的样本数

返回值：

（2）librosa.effects.hpss(y)

实现节奏与人声分离

import librosa audio_path = 'D:\My life\music\some music/sweeter.mp3'#音频地址 y, sr = librosa.load(audio_path, sr=44100) # 播放原音频 import IPython.display as ipd ipd.Audio(audio_path) # 分开提取人声和节奏 y_harmonic, y_percussive = librosa.effects.hpss(y) ipd.Audio(data = y_harmonic,rate=sr ) # 人声 ipd.Audio(data = y_percussive,rate=sr ) # 节奏

在这里插入图片描述
查看分离之后两部分的语谱图：

# 查看人声部分的语谱图 import librosa.display import matplotlib.pyplot as plt A = librosa.stft(y_harmonic) Adb = librosa.amplitude_to_db(abs(A)) librosa.display.specshow(Adb, sr=sr, x_axis='time', y_axis='log') plt.figure(figsize=(14, 5))

在这里插入图片描述

# 查看节奏部分的语谱图 A = librosa.stft(y_percussive) Adb = librosa.amplitude_to_db(abs(A)) librosa.display.specshow(Adb, sr=sr, x_axis='time', y_axis='log') plt.figure(figsize=(14, 5))

在这里插入图片描述

可以明显看到两部分的语谱图差别。节奏音频的语谱图是竖直条纹。

九、librosa.filters

过滤库生成(chroma、伪CQT、CQT等)。这些主要是librosa的其他部分使用的内部函数。

Mel滤波器组

librosa.filters.mel(sr, n_fft, n_mels=128, fmin=0.0, fmax=None, htk=False, norm=1)

sr：输入信号的采样率
n_fft：FFT组件数
n_mels：产生的梅尔带数
fmin：最低频率（Hz）
fmax：最高频率（以Hz为单位）。如果为None，则使用fmax = sr / 2.0
norm：{None，1，np.inf} [标量]
如果为1，则将三角mel权重除以mel带的宽度（区域归一化）否则，保留所有三角形的峰值为1.0

Mel变换矩阵

应用：

melfb = librosa.filters.mel(22050, 2048) import matplotlib.pyplot as plt plt.figure() librosa.display.specshow(melfb, x_axis='linear') plt.ylabel('Mel filter') plt.title('Mel filter bank') plt.colorbar() plt.tight_layout() plt.show()

在这里插入图片描述

十、librosa.onset

起跳检测和起跳强度计算。

十一、librosa.segment

用于结构分割的函数，如递归矩阵构造、时滞表示和顺序约束聚类。

十二、librosa.sequence

用于顺序建模的函数。各种形式的维特比解码，以及用于构造转换矩阵的辅助函数。

十三、librosa.util

辅助实用程序(规范化、填充、居中等)

（1）librosa.util.frame()

将时间序列分割成重叠的帧。

（2）librosa.util.pad_center()

将数组居中

（3）librosa.util.fix_length()

将数组数据的长度固定为精确的大小。

（4）librosa.util.fix_frames()

固定一个帧的最大值和最小值。

（5）librosa.util.index_to_slice()

从索引数组生成切片数组。关于这个函数的作用，需要学习一下numpy中切片数组的相关知识。

（6）librosa.util.softmask()

鲁棒地计算软掩码操作。

（7）librosa.util.sync()

边界之间多维数组的同步聚合。

（8）librosa.util.axis_sort()

对数组的行或列进行排序。

（9）librosa.util.normalize()

沿着选定的轴对数组进行标准化。

（9）librosa.util.roll_sparse()

系数矩阵滚动。

（10）librosa.util.sparsify_rows()

返回一个近似于输入x的行稀疏矩阵。

（11）librosa.util.buf_to_float()

将整数缓冲区转换为浮点值。

（12）librosa.util.tiny()

计算与输入数据类型对应的极小值。就是比如输入数据是int8类型，则返回int8类型可以表示的最小的数

（9）动态范围压缩Dynamic range compression（DRC）

简单地压缩为一个音频信号处理操作，降低响亮的体积声音、放大安静的声音，从而减少或压缩的音频信号的动态范围。压缩通常用于声音的记录和再现、广播、现场声音增强和某些乐器放大器中。

def dynamic_range_compression(x, C=1, clip_val=1e-5): return torch.log(torch.clamp(x, min=clip_val) * C)

十四、Deprecated(moved)

（1）dtw() 动态时间扭曲

import dtw from dtw import dtw x = ... y = ... dist, cost, acc, path = dtw(x, y, dist = lamda x, y: norm(x - y, ord = 1))

（2）fill_off_diagonal()

将一个矩阵的所有细胞设置为给定的值,如果它们位于约束区域之外。

十五、Rhythm features

（1）tempogram()

计算模板图:起始强度包络线的局部自相关。

十六、Feature manipulation

（1）delta()

计算增量特性:对输入数据沿选定轴的导数进行局部估计。计算了三角函数的萨维茨基-戈莱滤波。

（2）stack_memory()

短期历史嵌入:将数据向量或矩阵与自身的延迟副本垂直连接。

十七、Spectrogram decomposition

（1）librosa.decompose.decompose() 分解一个特征矩阵

（2）librosa.decompose.hpss()

Median-filtering harmonic percussive source separation

（3）librosa.decompose.nn_filter() 谱分解

Filtering by nearest-neighbors

decompose, hpss, nn_filter comps,acts = librosa.decompose.decompose(S, n_components=None, transformer=None, sort=False, fit=True, kwargs)

S：np.ndarray [shape=(n_features, n_samples), dtype=float]。输入的特征矩阵（如幅度谱）
n_components：int > 0 [scalar] or None。想要分解的分量数目，若设置为None，就默认n_feature的值
transformer：None or object。变换类型，若设置None，默认 sklearn.decomposition.NMF。否则，任何具有与NMF类似接口的对象都可以。transformer 必须遵循 scikit-learn传统,即输入数据必须是(n_samples, n_features)。
transformer.fit_transform()应该是S的转置S.T，返回值存储(转置)为activations。
分量components将会返回为：transformer.components_.T。
S ~= np.dot(activations, transformer.components_).T
或
S ~= np.dot(transformer.components_.T, activations.T)
sort：bool
如果为True，则分量按峰值频率升序排序。
如果与transformer一起使用，则将对分解参数的副本应用排序，而不是对内部参数进行排序。
fit：bool
如果为True，则从输入S估计组件。
如果为False，则假定组件是预先计算的，并存储在transformer中，不进行更改。
kwargs：Additional keyword arguments to the default transformer

例子：

y, sr = librosa.load(librosa.ex('choice'), duration=5) S = np.abs(librosa.stft(y)) comps, acts = librosa.decompose.decompose(S, n_components=8)

Matching

match_intervals() 将一组时间间隔与另一组时间间隔匹配。

match_events() 将一组事件与另一组事件匹配。

Miscellaneous

localmax() 在数组x中找到局部最大值。

peak_pick() 使用灵活的启发式算法选择信号中的峰值。

Input Validation

valid_audio() 验证变量是否包含有效的单声道音频数据。

valid_int() 确保输入值是整型的。

valid_intervals() 确保数组是时间间隔的有效表示。

File operations

example_audio_file() 获取包含音频示例文件的路径。

find_files() 获取目录或目录子树中已排序的(音频)文件列表。

、magphase

librosa.magphase(D, power=1)

D：经过stft得到的复数矩阵
power：幅度谱的指数，例如，1代表能量，2代表功率，等等。

D_mag：幅值D，
D_phase：相位P， phase = exp(1.j * phi) ， phi 是复数矩阵的相位角 np.angle(D)

应用：

import librosa y, sr = librosa.load('D:/My life/music/some music/sweeter.mp3') D = librosa.stft(y) magnitude, phase = librosa.magphase(D)

在这里插入图片描述

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/230033.html原文链接：https://javaforall.net

librosa | 系统实战（五~十七）

文章目录

（一 ~ 四）librosa学习点此处

五、频谱特性 Spectral representations

（1）短时傅里叶变换 short-time Fourier Transform（STFT）

（2）短时傅里叶逆变换（ISTFT）

（3）瞬时频率 ifgram()

（4）音乐中常用的CQT算法（constant-Q transform）

（5）icqt()

（6）hybrid_cqt()

（7）pseudo_cqt()

（8）快速梅林变换（fmt）

（8）计算时频信号中谐波的能量 interp_harmonics()

（9）谐波显示功能 salience()

（10）相位声码 phase_vocoder()

（11）相位幅值 magphase()

（12）使用IIR滤波器的时频表示 iirt()

六、幅度 Magnitude scaling

（1）amplitude_to_db()

（2）db_to_amplitude()

（3）power_to_db()

（4）db_to_power(S_db, ref=1.0)

（5）perceptual_weighting()

（6）A_weighting()

（7）pcen()

七、时频转化 Time and frequency conversion

（1）frames_to_samples()

（2）frames_to_time()

（3）samples_to_frames()

（4）samples_to_time()

（5）time_to_frames()

（6）time_to_samples()

（7）hz_to_note()

（8）hz_to_midi()

（9）midi_to_hz()

（10）midi_to_note()

（11）note_to_hz()

（12）note_to_midi()

（13）hz_to_mel()

（14）hz_to_octs()

（15）mel_to_hz()

（16）octs_to_hz()

（17）fft_frequencies()

（18）cqt_frequencies()

（19）mel_frequencies()

（20）tempo_frequencies()

（21）samples_like()

（22）times_like()

八、librosa.effects

（1）librosa.effects.split

（2）librosa.effects.hpss(y)

九、librosa.filters

Mel滤波器组

十、librosa.onset

十一、librosa.segment

十二、librosa.sequence

十三、librosa.util

（1）librosa.util.frame()

（2）librosa.util.pad_center()

（3）librosa.util.fix_length()

（4）librosa.util.fix_frames()

（5）librosa.util.index_to_slice()

（6）librosa.util.softmask()

（7）librosa.util.sync()

（8）librosa.util.axis_sort()

（9）librosa.util.normalize()

（9）librosa.util.roll_sparse()

（10）librosa.util.sparsify_rows()

（11）librosa.util.buf_to_float()

（12）librosa.util.tiny()

（9）动态范围压缩Dynamic range compression（DRC）

十四、Deprecated(moved)

（1）dtw() 动态时间扭曲

（2）fill_off_diagonal()

十五、Rhythm features

（1）tempogram()

十六、Feature manipulation

（1）delta()

（2）stack_memory()

十七、Spectrogram decomposition