11 KiB
Documenation of our Program
Table of Contents
How to run the Software
If you read the readme file, you will see the basic setup command in order to run the program.
You will need nodejs, the newer the better.
The software is tested with nodejs-19, nodejs-20, nodejs-22 and nodejs-24.
These versions are confirmed to work with our software, but prior versions may work aswell.
To get nodejs, simply go to their website and download whatever version you want and install it.
Afterward go into the directory of our program, and run the command npm i, this will install all the necessary libraries.
Next up you need to set up the .env file.
The file must contain your keys for the modules you want to use.
The .env file looks like this:
auth_username=wefhjhjakeghjkahejkghjkaegh
auth_password=wefhjhjakeghjkahejkghjkaegh
Once that is done, you can run the command npm start to actually start the program.
Alternatively you can double click the start.bat if you are on Windows for example.
How it works
Our software is fully modular, this means, every part can easily be edited, replaced, removed or added without needing to make many adjustments in the code.
The modules can be found under ./services/modules
The Structure is as follows:
Inside of modules are folders named after the general thing the module does.
For example, transcription-remote, this folder contains all the modules that do transcription on a remote service, such as assembly-ai for example.
Inside of these folders we put our modules.
The name of the folder and the module dont matter, as long as the structure is kept, and the module filename ends with .js, it will work.
The program iterates through all of the folders within ./services/modules, and then iterates over each .js file within each of these folders, and then loads them into a specific map called mapFunctions.
This map is available ANYWHERE in the backend code, even within a module, which means, you can call a module, from within a module, from within a module, from within yet another module.
Modules
Building a new module is super easy, anyone can get it done.
All you need to do is follow the previously mentioned structure.
When you have created your module file, the .js file, you simply copy paste this code snippet into it.
module.exports = {
name:"example", // Unique name for our function that will later be used to get the function from the map via "mapFunctions.get("example").function()"
type:"example-type", // value used to differentiate each module to order them in the UI
displayname:"Example", // The displayname used within the UI
async function(randomParameter){
// Here we put a simple console.log to show how the system works
// This function will be called from the @startup.js function in the utility folder
console.log(`\n------------\nThis is the example function called by the ${randomParameter} function\n------------\n`);
}
}
If you had a look at the code and the modules and so on, you might have spotted a file in ./services/modules/utility called example.js.
This is a template file that you can just copy paste and use as a base for your new module.
It has the same exact code as mentioned right above.
Now as for how the code works.
Each module is essentially just a JSON object that is being exported, so that the main process can load it into the mapFunctions map.
the required fields are as follows
- name: This field contains the internal name through which it will be called. It HAS TO BE unique.
- type: This field contains the type of the module. Is it an LLM module, transcription module, or whatever.
- displayname: This field contains the Displayname used in the UI.
And lastly the function call.
This function call is what is being called by other functions, it is generally the main entrypoint for a module.
Sure, you can always set custom function names, but this is a general solution that works without having to manage function names.
You can always define custom functions inside your module that you call from the entrypoint function, but it is highly advised just to call the entrypoint function from outside of the module as it prevents headaches by just working as intended.
Note, there are also other module fields for specific module types, such as:
- description: Used by any module that needs to be shown on the UI, such as Transcription and LLM modules.
- audioformat: Used by transcription modules to tell the audio extraction module what audio format to use.
IPC
Our software is split up into 2 pieces, the main process (Backend) and the frontend.
The frontend is written in Electron, so it is essentially just a website.
This makes it relatively easy to edit the frontend.
But it comes with one downside, which is, the frontend and backend cant just directly communicate, you first need to set up an IPC channel between them.
As for the base functionality, all of this is already done.
The frontend gets a list of all the available LLM and Transcription modules sent by the backend on startup.
The JSON object for this information looks like this:
{
"ai_modules":[
{"name": "Example", "displayname": "Example"}
],
"transcription_modules":[
{"name": "Example", "displayname": "Example"}
]
}
The backend needs a specific set of informations from the frontend in order to start the pipeline.
The JSON for that looks like this:
{
"video": {
"module": "extraction-video-to-audio",
"inputVideoPath": "A:\\programing\\@projects\\video2document\\test\\unit\\testvideo.mp4"
},
"transcription": {
"module": "assembly"
},
"document": {
"module": "llm-saia_openai_gpt",
"type": "followup-report",
"outputType": "pdf"
}
}
As you can see in this JSON object, each part specifies which module is being used for each step.
The module names are each the name field specified in the module itself.
As for the rest of the fields, they are pretty self explanatory except document.type, that is a predefined report type.
This is the minimum required setup for the currently implemented pipeline to work.
You can always add fields to it, but dont remove the ones from above.
Authentication
Our Software uses a custom API key management System.
This system itself is proprietary, and will as such not be delivered with the software.
The way it works is simply via a HTTP request.
In the current version, the main reads the username and password for authentication from the .env file, and then uses these in the header for the HTTP request.
hostname: "keyserver.dommymommy.xyz", // The URL to the key server
port: 443, // The Port of the
path: "/v1/auth", // The API Endpoint
method: "GET",
headers: {
"Content-Type": "application/json", // The content type should be JSON
"username": un, // the Username used to authenticate
"password": pw // The password used to authenticate
}
The Important bit of this whole setup
Once the HTTP request is made, it will return a JSON object with the API keys as fields.
One such output could look like this:
{
"ASSEMBLYAI_API_KEY": "eajgjkhgahghahegoikh",
"GOOGLE_API_KEY": "eajgjkhgahghahegoikh",
"SAIA_API_KEY": "eajgjkhgahghahegoikh"
}
The key for each entry is being used to store the key in memory.
Specifically under process.env
So, if everything in this request worked out, we will have:
process.env.ASSEMBLYAI_API_KEY
process.env.GOOGLE_API_KEY
process.env.SAIA_API_KEY
These variables are accessible anywhere in the code and contain the API Keys, so make sure you dont add some untrusted modules that could steal these API Keys.
UI
The UI has a simple, self-explanatory design, in white and blue.
For easy handling and understanding, the UI is using 6 steps to guide the user through the process and offers a help page
with more defined explanations regarding the steps of the GUI. All parts used in the GUI are stored in the directory ./electron/main.
Files used for the UI:
- index.html
- style.css
- script.js
- renderer.js
- preload.js
- languages.js
- package-lock.json
- package.json
Folders used for the UI:
- /flags
- /icons
- /node_modules
index.html: This file is the basic framework of our software. Comments in the code define the different UI sections. The comments are the headlines of the code below them.
style.css: Contains all the css code of the software used in the UI.
script.js: Stores all functions used in the UI. The code is separated by comments in their matching UI section.
renderer.js: Mainly contains every listener function used in the UI, which listens to any events occuring in the UI, to handle these events as intended. The code is separated by comments in their matching UI section.
preload.js: Contains IPC functions to allow communication between the UI and the main process.
languages.js: Holds one JSON, which is used to store the different language variables. These are used in the script.js for the change of the displayed language of the UI. Add languages here, if you want to add more options in the language selection.
How to add more languages:
- Add another language block, like an existing one in the file. (Note: Use every key, which is also used in the other sections, beside the first key like "eng". This first key should be always unique from the others)
- Assign the desired values to the keys in the new language section.
How to add more text which changes languages:
- Create the element in the html file with an unique id.
- Add this id to every language section and assign them a matching value.
- Add inside the
script.jsfile, inside thechangeLanguage()function a document call like the others. Except with our id.
package-log.json: It's an electron module file. No changes needed.
package.json: This is an electron base file. No changes needed.
/flags: This directory contains the flags used for the language selection dropdown menu.
/icons: Pictures for the document preview are stored here.
/node_modules: Contains nodes used by electron.
Storage
In the root directory of the project you will find the storage folder.
This directory is used to persist all artifacts generated throughout the processing pipeline, allowing each step to operate independently while still sharing results.
Each module is expected to read from and write to this directory, depending on its responsibility in the pipeline.
The storage folder contains the following sub folders:
- audio stores the extracted audio from the video
- audio-snippets store the individual speaker snippets for speaker identification
- documents store the HTML files generated by the AI / LLM modules
- documentType stores the premade prompt templates used for the AI calls
- transcriptionSummaries store the transcripts after they have been transformed from word-level to sentence-level
- transcripts store the raw transcripts returned by the transcription services