Handling audio files.
Nowadays Python is prevailing as a programming language thanks to its user-friendly approach and many available libraries such as Pydub described here and many others. Pydub is a simple and easy high-level library which is based on ffmpeg and influenced by jquery.
Python Knowledge Base: Make coding great again.
- Updated:
2024-12-01 by Andrey BRATUS, Senior Data Analyst.
Basic audio file manitulations:
Mixing, applying low pass filter and mono audio effects:
Speech recognition script:
Mood detection from audio file:
It is used for processing audio, adding effects, id3 tags, slicing, concatenating audio tracks. Pydub supported in python starting from version 2.6, and widely used in Python 3.
In other words Pydub is a simple, well-designed Python module for audio manipulation. As stated by PyDub creators:
“Pydub lets you do stuff to audio in a way that isn’t stupid.”
The following simple script shows example of opening audio file, making and exporting reversed file, adding volume to it (+15), slicing it (original[0:2000]) and creating merging file as 2 original fragments, silent 1 second fragment plus reverse part.
from pydub import AudioSegment
original = AudioSegment.from_wav('original.wav')
print(type(original))
print(original)
reversed = original.reverse()
reversed.export('reversed.wav')
reversed = reversed + 15
# print(dir(original))
first_two = original[0:2000]
first_two.export('first_two.wav')
print(len(original))
merged = original * 2 + AudioSegment.silent(1000) + reversed
merged.export('merged_audio.wav')
print('Merged file is created !!!')
OUT: Merged file is created !!!
Mixing and overlaying audio files:
from pydub import AudioSegment
beat = AudioSegment.from_wav('beat.wav')
sax = AudioSegment.from_wav('sax.wav')
print(len(beat), len(sax))
beat2 = beat * 2
beat2.export('beat2.wav')
mixed = beat2.overlay(sax)
mixed.export('mixed.wav')
final = beat2 + mixed * 2 + sax + beat2 + sax
final.export('finalmix.wav')
print('Final mix is created !!!')
OUT: Final mix is created !!!
from pydub import AudioSegment
beat = AudioSegment.from_wav('mybeat.wav')
beat_low = beat.low_pass_filter(2000)
beat_low.export('beat_low.wav')
beat_left = beat_low.pan(-1)
beat_right = beat_low.pan(1)
beat_final = beat_left + beat_right + beat_low
beat_final.export('beat_final.wav')
print('Final mix is created !!!')
OUT: Final mix is created !!!
from speech_recognition import Recognizer, AudioFile
recognizer = Recognizer()
with AudioFile('bratus_net_demo.wav') as audio_file:
audio = recognizer.record(audio_file)
text = recognizer.recognize_google(audio)
print(text)
OUT: bratus net is a best jokes puns and memes site ever
from speech_recognition import Recognizer, AudioFile
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
recognizer = Recognizer()
with AudioFile('chile.wav') as audio_file:
audio = recognizer.record(audio_file)
text = recognizer.recognize_google(audio)
nltk.download('vader_lexicon')
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(text)
# print(scores)
if scores['compound'] > 0:
print('Mood detected as Positive')
else:
print('Mood detected as Negative')
OUT: Mood detected as Positive