DisCoder is a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms.
The fetch-decode-execute cycle is followed by a processor to process an instruction. The cycle consists of several stages. Depending on the type of instruction, additional steps may be taken: If the ...
Abstract: The application of machine learning in underwater acoustics is often limited by the lack of high-quality data. One method to avoid this data issue is to use modeled data to train a machine ...
Abstract: We propose a deep neural network with spectrogram matching and mutual attention (SMMA-Net) for audio clue-based target speaker extraction (TSE). To effectively use the auxiliary speech, we ...
All the datasets must be located in the datasets folder. This folder should contain the following subfolders after downloading the datasets: GTZAN Speech_Music: Contains the GTZAN Speech Music dataset ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results