Check out my blog on my progress and process throughout GSoC 2019!
Given an audio file of some recognizable song, autosynch will try to align its lyrics to their temporal ___location in the song. The song lyrics must be available on Genius.
This project is still in its early stages and is inaccurate in many cases. Optimization is a work in progress, but feel free to try it out, modify it, or contribute!
Developed during Google Summer of Code 2019 with CCExtractor.
To install, do the following:
git clone
cd autosynch
pip install -r requirements.txt
Note: autosynch is supported only on Python 3.6+.
Using autosynch requires a trained model for vocal isolation as well as PortAudio. For mp3 support, SoX is required. On MacOS/Linux, get everything by executing:
chmod a+x setup.sh
./setup.sh
If you would like to download the weights manually or get a different version, check here:
Weights must be placed into autosynch/mad_twinnet/outputs/states
.
On Mac:
brew install portaudio
On Linux:
sudo apt-get update
sudo apt-get install portaudio19-dev
Note: Installing SoX is optional and only required for processing mp3 files.
On Mac:
brew install ffmpeg
On Linux:
sudo apt install ffmpeg
To play a song with its lyrics displayed at its calculated position:
python autosynch/playback.py [audio_file.wav] [artist] [song_title]
It will take a few minutes to perform the alignment process. To save the
alignment data to eliminate processing time in future plays of the same audio,
add the flag -s SAVE_DIR
, where SAVE_DIR
is the directory you want to save
the alignment data.
If you have already generated and saved an alignment data file:
python autosynch/playback.py [audio_file.wav] -f [align_file.yml]
If you would like to process an mp3 file, see this section. Running with an mp3 will automatically generate a wav file in the same directory.
Note: If you did not use setup.sh
, first make sure you set your Python
environment correctly with export PYTHONPATH=$PYTHONPATH:./
from the outer
autosynch
directory.
Bruno Mars - Finesse
(https://www.youtube.com/watch?v=csBDM14ssts)
The last chorus lags behind a bit, but for the most part sections and lines are nicely aligned.
Fun. - We Are Young
(https://www.youtube.com/watch?v=Z-yTGKd3ji8)
The instrumental at the beginning throws off the first verse, but everything catches up in by line 4.
- de Jong, N. and T. Wempe. "Praat script to detect syllable nuclei and measure speech rate automatically." Behavior Research Methods 41(2), 2009, pp. 385–390.
- Dedina, M. J. and H. C. Nusbaum. "PRONOUNCE: a program for pronunciation by analogy." Computer Speech & Language 5(1), 1991, pp. 55-64.
- Drossos, K., S. I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, Y. Bengio. "MaD TwinNet: Masker-denoiser architecture with twin networks for monaural sound source separation." IJCNN 2018.
- Lee, K. and M. Cremer. "Segmentation-based lyrics-audio alignment using dynamic programming." ISMIR 2008.
- Marchand, Y. and R. I. Damper. "A multistrategy approach to improving pronunciation by analogy." Computational Linguistics 26(2), 2000, pp. 196-219.
- Marchand, Y. and R. I. Damper. "Can syllabification improve pronunciation by analogy of English?" Natural Language Engineering 13(1), 2007, pp. 1-24.
- Nieto, O. and J. P. Bello. "Systematic exploration of computational music structure research." ISMIR 2016.
- Sejnowski, T. J. and C. R. Rosenberg. "Parallel networks that learn to pronounce English text." Complex Systems 1(1), 1987, pp. 145–168.