Projects

Document Parsing and Question Answering with LLMs Served Locally

Published on April 22, 2024

This project facilitates local document parsing and question answering using large language models (LLMs), encompassing document parsing, text chunking, vectorization, prompting, and LLM-based question answering, all orchestrated through a streamlined process and Dockerized environment, offering benefits in privacy, cost efficiency, educational value, customization, and scalability, with potential use cases spanning various domains such as enterprises, research institutions, legal firms, and educational settings, leveraging tools like Docker, Unstructured, FAISS, Langchain, and Llama.cpp for seamless setup and operation.

Tiny Story Generator: Fine-tuning “Small” Language Models with PEFT

Published on February 06, 2024

The Tiny Story Generator project is an exciting development in the field of natural language processing. By fine-tuning a small language model on short stories, the project demonstrates the possibility of generating high-quality narratives using machine learning techniques. The project’s use of the PEFT technique and its focus on short stories make it a unique and innovative approach to language generation. The project’s code and data are publicly available, making it an excellent resource for researchers and enthusiasts interested in natural language processing and machine learning.

From Depth Maps to 3D Point Clouds: Data Conversion

Published on April 03, 2022

This project introduces an open-source package that streamlines the conversion of depth map images recorded by a stage system into 3D point clouds, leveraging tools like Open3D, segmentation techniques, and integration with SMPL models for human motion capture and character animation applications.

Detectron-2 Model Retraining

Published on March 22, 2022

This project utilizes Detectron2 for model retraining, aiming to enhance segmentation accuracy for specific objects, particularly humans, within the CoCo Dataset. It starts with Docker image setup for consistent deployment, filtering relevant categories, and balances instance distribution to optimize dataset for retraining, followed by semantic segmentation model retraining, culminating in the evaluation of the retooled model’s efficacy in image and video inference tasks.