Clips AI

Open Source | Repurposing & Clips

Overview

Clips AI is an open-source Python library, released under the MIT license, that automatically converts longform video into short clips. Built by a 3-person team (Benjamin Smidt, Armel Talla, and Johann Cai), the project has accumulated 471 GitHub stars and 92 forks since its initial release in late 2023. The library is published to PyPI as clipsai and installed via pip, making it accessible to any Python developer without requiring a paid account or API key. The library is built for audio-centric, narrative-based content: podcasts, interviews, speeches, and sermons. Its clipping algorithm is based on the TextTiling method, originally conceptualized by researcher Marti A. Hearst in the 1990s, and extended using BERT embeddings to detect topic shifts in a video transcript. The algorithm segments content at the sentence level by analyzing word distribution patterns, not by arbitrary time cuts, producing clips that correspond to coherent topic transitions in the original video. The resizing module works by using Pyannote speaker diarization to identify who is speaking at any given moment and dynamically reframe the video so the current speaker stays in frame. Converting a 16:9 interview recording to 9:16 vertical format for social media requires a free Hugging Face access token to use Pyannote. The clipping feature requires no token at all. Both modules run fully locally on the developer's machine with no cloud API calls to Clips AI servers, no usage caps, and no subscription. The library is accompanied by a live UI demo at demo.clipsai.com where developers can test clip generation on sample video before integrating the library into their own pipeline.

Features

  • Transcript-based clip detection: Uses the TextTiling algorithm extended with BERT embeddings to find topic-shift boundaries in audio transcripts, producing clips aligned to natural narrative breaks rather than arbitrary time windows
  • WhisperX transcription integration: Transcribes video or audio files via WhisperX, an open-source Whisper wrapper that provides word-level start and stop timestamps required for accurate clip boundary mapping
  • Speaker-aware video resizing: Uses Pyannote speaker diarization to dynamically reframe 16:9 video to 9:16 (or any target aspect ratio) by tracking who is speaking at each moment in the recording
  • ClipFinder class with find_clips method: Returns a list of Clip objects each containing start_time, end_time, start_char, and end_char properties, giving developers full control over how clips are trimmed and exported
  • MediaEditor trimming for audio and video files: Supports both AudioFile and AudioVideoFile inputs, letting developers trim either audio-only or full video files to the clip boundaries returned by ClipFinder
  • MIT license with full source code on GitHub: All source code is in the ClipsAI/clipsai GitHub repository with 68 commits, 4 branches, and contributions from 3 developers
  • PyPI package installable with pip: Install via pip install clipsai in any Python virtual environment, with no Clips AI account, API key, or cloud dependency required
  • Live UI demo at demo.clipsai.com: Demonstrates clip generation on sample videos so developers can evaluate output quality before integrating the library into a production pipeline
  • Designed for narrative-content formats: Optimized for podcasts, interviews, speeches, and sermons where the transcript is the primary structural signal for where meaningful clips begin and end
  • Resize reference documentation with crop segment output: The resize function returns a structured crops object with per-segment reframing coordinates that developers apply during video encoding

Best For

Python developers building podcast or interview repurposing pipelines who need programmatic clip detection without a paid API dependency, Developer teams at media companies who want to run longform video segmentation entirely on their own infrastructure with no per-minute usage costs, Content engineers integrating transcript-based clip extraction into a video CMS or publishing workflow for sermon, speech, or interview archives, Researchers and engineers who want to experiment with or modify the TextTiling plus BERT embedding clip detection algorithm using the MIT-licensed source code, Indie developers building repurposing tools who need a free open-source foundation for converting 16:9 recordings to 9:16 vertical clips for social distribution

How It Works

Getting started requires three pip installations: the clipsai package itself, WhisperX (installed directly from its GitHub repository via pip install whisperx@git+https://github.com/m-bain/whisperx.git), and ffmpeg for video trimming. WhisperX is an open-source wrapper around OpenAI Whisper that adds word-level timestamps, which the ClipFinder algorithm needs to map transcript segments back to precise timestamps in the video file. To generate clips, developers instantiate a Transcriber object and call transcriber.transcribe(audio_file_path="/abs/path/to/video.mp4"). This produces a Transcription object passed to a ClipFinder instance via clipfinder.find_clips(transcription=transcription). The method returns a list of Clip objects, each containing start_time and end_time in seconds, plus start_char and end_char indices into the original transcript. A MediaEditor instance then trims the original file to each clip's boundaries, producing a new video file per clip. To resize a video to a different aspect ratio, developers call the top-level resize() function, passing the video file path, a Hugging Face pyannote_auth_token, and the target aspect_ratio as a tuple such as (9, 16) for vertical video. The function returns a crops object whose .segments attribute contains per-segment cropping coordinates keyed to whichever speaker is active at that time. Developers apply these crop coordinates during video export using ffmpeg or a compatible video processing library. The entire workflow runs on local hardware. The Clips AI documentation covers three reference sections: Clip (ClipFinder and Clip classes), Resize (the resize function and its crop output), and Transcribe (the Transcriber class and Transcription object), with code samples and links to the relevant source files on GitHub.

Visit Clips AI