Popularity

6.4

Growing

Activity

0.0

Stable

Stars 2,379

Watchers 75

Forks 212

Last Commit over 1 year ago

Description

aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

Code Quality Rank: L3

Programming language: Python

License: GNU Affero General Public License v3.0

Tags: Text Processing Audio Speech Data HTML Scientific Engineering Utilities Markup Linguistic Mathematics Multimedia Printing XML Sound Analysis Education Forced Alignment Speech

Latest version: v1.7.3

aeneas alternatives and similar packages

Based on the "Speech Data" category.
Alternatively, view aeneas alternatives based on common mentions on social networks and blogs.

SpeechRecognition

9.0 7.5 L3 aeneas VS SpeechRecognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
Watson Developer Cloud Python SDK

6.6 7.1 L5 aeneas VS Watson Developer Cloud Python SDK

:snake: Client library to use the IBM Watson services in Python and available in pip as watson-developer-cloud

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

speechpy

4.5 0.0 aeneas VS speechpy

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
Prosodylab-Aligner

3.3 0.0 L4 aeneas VS Prosodylab-Aligner

Python interface for forced audio alignment using HTK and SoX
speech-to-text-websockets-python

2.8 0.0 L5 aeneas VS speech-to-text-websockets-python

DISCONTINUED. Python client that interacts with the IBM Watson Speech To Text service through its WebSockets interface
praatIO

2.8 4.8 L3 aeneas VS praatIO

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc).
ProMo

1.8 2.7 L4 aeneas VS ProMo

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.
pyAcoustics

1.7 3.9 L4 aeneas VS pyAcoustics

A collection of python scripts for extracting and analyzing acoustics from audio files.
pysle

1.2 4.6 L4 aeneas VS pysle

Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of aeneas or a related project?

Add another 'Speech Data' Package

Popular Comparisons

README

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

Version: 1.7.3
Date: 2017-03-15
Developed by: ReadBeyond
Lead Developer: Alberto Pettarin
License: the GNU Affero General Public License Version 3 (AGPL v3)
Contact: [email protected]
Quick Links: Home - GitHub - PyPI - Docs - Tutorial - Benchmark - Mailing List - Web App

Goal

aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

For example, given this text file and this audio file, aeneas determines, for each fragment, the corresponding time interval in the audio file:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes,         => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies,                 => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel:        => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament,         => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring,                  => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content,             => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding:          => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

[Waveform with aligned labels, detail](wiki/align.png)

This synchronization map can be output to file in several formats, depending on its application:

research: Audacity (AUD), ELAN (EAF), TextGrid;
digital publishing: SMIL for EPUB 3;
closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
Web: JSON;
further processing: CSV, SSV, TSV, TXT, XML.

System Requirements, Supported Platforms and Installation

System Requirements

a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)
FFmpeg
eSpeak
Python packages BeautifulSoup4, lxml, and numpy
Python headers to compile the Python C/C++ extensions (optional but strongly recommended)
A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms

aeneas has been developed and tested on Debian 64bit, with Python 2.7 and Python 3.5, which are the only supported platforms at the moment. Nevertheless, aeneas has been confirmed to work on other Linux distributions, Mac OS X, and Windows. See the PLATFORMS file for details.

If installing aeneas natively on your OS proves difficult, you are strongly encouraged to use aeneas-vagrant, which provides aeneas inside a virtualized Debian image running under VirtualBox and Vagrant, which can be installed on any modern OS (Linux, Mac OS X, Windows).

Installation

All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.

The generic OS-independent procedure is simple:

Install Python (2.7.x preferred), FFmpeg, and eSpeak
Make sure the following executables can be called from your shell: espeak, ffmpeg, ffprobe, pip, and python
First install numpy with pip and then aeneas (this order is important):
```
pip install numpy
pip install aeneas
```
To check whether you installed aeneas correctly, run:

    python -m aeneas.diagnostics
    ```


## Usage

1. Run without arguments to get the **usage message**:

    ```bash
    python -m aeneas.tools.execute_task
    python -m aeneas.tools.execute_job
    ```

    You can also get a list of **live examples**
    that you can immediately run on your machine
    thanks to the included files:

    ```bash
    python -m aeneas.tools.execute_task --examples
    python -m aeneas.tools.execute_task --examples-all
    ```

2. To **compute a synchronization map** `map.json` for a pair
   (`audio.mp3`, `text.txt` in
   [plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN)
   text format), you can run:

    ```bash
    python -m aeneas.tools.execute_task \
        audio.mp3 \
        text.txt \
        "task_language=eng|os_task_file_format=json|is_text_type=plain" \
        map.json
    ```

   (The command has been split into lines with `\` for visual clarity;
   in production you can have the entire command on a single line
   and/or you can use shell variables.)

   To **compute a synchronization map** `map.smil` for a pair
   (`audio.mp3`,
   [page.xhtml](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED)
   containing fragments marked by `id` attributes like `f001`),
   you can run:

    ```bash
    python -m aeneas.tools.execute_task \
        audio.mp3 \
        page.xhtml \
        "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
        map.smil
    ```

   As you can see, the third argument (the _configuration string_)
   specifies the parameters controlling the I/O formats
   and the processing options for the task.
   Consult the
   [documentation](http://www.readbeyond.it/aeneas/docs/)
   for details.

3. If you have several tasks to process,
   you can create a **job container**
   to batch process them:

    ```bash
    python -m aeneas.tools.execute_job job.zip output_directory
    ```

   File `job.zip` should contain a `config.txt` or `config.xml`
   configuration file, providing **aeneas**
   with all the information needed to parse the input assets
   and format the output sync map files.
   Consult the
   [documentation](http://www.readbeyond.it/aeneas/docs/)
   for details.

The
[documentation](http://www.readbeyond.it/aeneas/docs/)
contains a highly suggested
[tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
which explains how to use the built-in command line tools.


## Documentation and Support

* Documentation:
  [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/)
* Command line tools tutorial:
  [http://www.readbeyond.it/aeneas/docs/clitutorial.html](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
* Library tutorial:
  [http://www.readbeyond.it/aeneas/docs/libtutorial.html](http://www.readbeyond.it/aeneas/docs/libtutorial.html)
* Old, verbose tutorial:
  [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html)
* Mailing list:
  [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment)
* Changelog:
  [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html)
* High level description of how aeneas works:
  [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
* Development history:
  [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
* Testing:
  [TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md)
* Benchmark suite:
  [https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/)


## Supported Features

* Input text files in `parsed`, `plain`, `subtitles`, or `unparsed` (XML) format
* Multilevel input text files in `mplain` and `munparsed` (XML) format
* Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
* Input audio file formats: all those readable by `ffmpeg`
* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
* Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
* MFCC and DTW computed via Python C extensions to reduce the processing time
* Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
* Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
* Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
* Batch processing of multiple audio/text pairs
* Download audio from a YouTube video
* In multilevel mode, recursive alignment from paragraph to sentence to word level
* In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
* Adjustable splitting times, including a max character/second constraint for CC applications
* Automated detection of audio head/tail
* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
* Execution parameters tunable at runtime
* Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
* Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release


## Limitations and Missing Features

* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
* Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
* No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
* [Open issues](https://github.com/readbeyond/aeneas/issues)

### A Note on Word-Level Alignment

A significant number of users runs **aeneas** to align audio and text
at word-level (i.e., each fragment is a word).
Although **aeneas** was not designed with word-level alignment in mind
and the results might be inferior to
[ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools)
for languages with good ASR models,
**aeneas** offers some options to improve
the quality of the alignment at word-level:

* multilevel text (since v1.5.1),
* MFCC nonspeech masking (since v1.7.0, disabled by default),
* use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).

If you use the ``aeneas.tools.execute_task`` command line tool,
you can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example:

```bash
$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

If you use aeneas as a library, just set the appropriate RuntimeConfiguration parameters. Please see the command line tutorial for details.

License

aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.

Licenses for third party code and files included in aeneas can be found in the licenses directory.

No copy rights were harmed in the making of this project.

Supporting and Contributing

Supporting

Would you like supporting the development of aeneas?

I accept sponsorships to

fix bugs,
add new features,
improve the quality and the performance of the code,
port the code to other languages/platforms, and
improve the documentation.

Feel free to get in touch.

Contributing

If you think you found a bug or you have a feature request, please use the GitHub issue tracker to submit it.

If you want to ask a question about using aeneas, your best option consists in sending an email to the mailing list.

Finally, code contributions are welcome! Please refer to the Code Contribution Guide for details about the branch policies and the code style to follow.

Acknowledgments

Many thanks to Nicola Montecchio, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text.

Paolo Bertasi, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage.

Chris Hubbard prepared the files for packaging aeneas as a Debian/Ubuntu .deb.

Daniel Bair prepared the brew formula for installing aeneas and its dependencies on Mac OS X.

Daniel Bair, Chris Hubbard, and Richard Margetts packaged the installers for Mac OS X and Windows.

Firat Ozdemir contributed the finetuneas HTML/JS code for fine tuning sync maps in the browser.

Willem van der Walt contributed the code snippet to output a sync map in TextGrid format.

Chris Vaughn contributed the MacOS TTS wrapper.

All the mighty GitHub contributors, and the members of the Google Group.

*Note that all licence references and agreements mentioned in the aeneas README section above are relevant to that project's source code only.

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Description

aeneas alternatives and similar packages

SpeechRecognition

Watson Developer Cloud Python SDK

WorkOS - The modern identity platform for B2B SaaS

speechpy

Prosodylab-Aligner

speech-to-text-websockets-python

praatIO

ProMo

pyAcoustics

pysle

Popular Comparisons

README

aeneas

Goal

System Requirements, Supported Platforms and Installation

System Requirements

Supported Platforms

Installation

License

Supporting and Contributing

Sponsors

Supporting

Contributing

Acknowledgments