Skip to main content
PremKumar Kora
PremKumar Kora
Data Scientist & AI Architect

Leveraging deep expertise to deliver high-impact solutions with a strong focus on excellence and results.

Enterprise Projects

I have successfully led and contributed to AI-driven innovations, data science, and technology transformation across industries. My journey spans building AI models, optimizing data pipelines, and driving strategic initiatives, blending technical expertise with impactful decision-making in AI, ML, and enterprise solutions.

Claim Processing System

Objective: This project aims to automate the approval or rejection of insurance claims by analyzing key documents. If a claim is rejected, the system provides specific reasons for rejection.
How it WorksThe user enters a claim number to initiate analysis. Relevant documents (Estimate Document, General Loss Report, Photo Report) are retrieved. 
The system verifies claim details, including:
Photo Report Analysis: Checks for missing photo descriptions.
Loss Date Verification: Ensures the claim falls within the policy period.
Line Item Verification: Cross-checks pre-approved items with submitted reports.
Total Cost Comparison: Flags mismatches between reported and estimated costs. If the claim is approved, no action is needed. If rejected, the system provides specific reasons (e.g., missing details, policy coverage issues, or cost discrepancies).
Technology Used: Python, MySQL, Faiss (vector database) Libraries: pandas, NLP, Streamlit, LangChain, OpenAI, Boto3 AI Model: Llama3
This project streamlines claim processing by automating decision-making, reducing manual review time, and ensuring accurate and fair claim evaluations.

AI-Driven Student Lifecycle Management System

ObjectiveThis project aims to optimize the student journey and enhance institutional performance through AI-driven automation and predictive analytics. The system integrates multiple data streams, including admissions, academic records, student engagement, and faculty feedback, to provide a real-time, holistic view of student progress.
How it Works
Predictive Analytics: Identifies students at risk of dropping out based on attendance, grades, and behavioral patterns.
Admissions Automation: AI-driven scoring mechanisms process applications, verify documents, and shortlist candidates.
AI Chatbots: Handle common student queries related to enrollment, financial aid, and academic advising, reducing administrative workload.
Adaptive Learning: Deep learning models analyze student interactions with course materials to personalize learning paths.
Natural Language Processing (NLP): Analyzes student feedback from surveys and discussions to enhance curriculum design.
Computer Vision: Automates attendance tracking, eliminating manual checks.
AI-Powered Scheduling: Optimizes timetables and resource allocation, reducing conflicts in classroom assignments.
Faculty Dashboards: Provide real-time performance insights to adjust teaching strategies dynamically.
Anomaly Detection: Flags irregularities in exam results and academic integrity, aiding fraud prevention.
Automated Transcript Generation: Ensures secure and error-free document processing.
Technology UsedProgramming Languages: Python, SQL, Libraries & Tools: pandas, NLP, Streamlit, LangChain, OpenAI, Boto3. AI Models: Predictive Analytics, Deep Learning, Computer Vision

This project improves student engagement, retention rates, and institutional efficiency through real-time AI-powered decision-making. By leveraging automation, predictive insights, and intelligent scheduling, the platform enhances both student and faculty experiences, positioning itself at the forefront of AI-driven educational solutions.

Message Prediction

Objective: This project aims to predict whether incoming messages should be assigned for further action or marked as No Action. If the prediction is No Action with a confidence score above 90%, the note is automatically moved to the No Action.
How it Works: The model analyzes incoming notes and determines their category. If No Action is predicted with high confidence, the note is removed from further processing. Otherwise, the note remains in the Assign bucket for manual review.
Technology Used: Python, MySQL Libraries: numpy, pandas, NLP, Huggingface, scikit-learn, Celery AI Model: DistilBERT (Transformer-based model for text classification)
This project automates message classification with 95% accuracy, streamlining workflow efficiency by enabling batch processing and real-time decision-making.

Feedback Prediction

Objective: This project aims to analyze and predict customer sentiment based on feedback. The sentiment is classified into five levels, ranging from Very Dissatisfied to Very Satisfactory. Based on the prediction, the feedback is automatically routed for further action or completion.
How it WorksIf feedback is negative or neutral, it is flagged for further review. If feedback is positive, it is marked as completed.
Technology UsedPython, MySQL, Libraries: numpy, pandas, NLP, Huggingface, scikit-learn, FastAPI, AI Model: DistilBERT (Transformer-based model for text analysis)

This system automates sentiment analysis, enhances decision-making, and improves workflow efficiency. 

AI PDF Chatbot & Extractor

Objective: This project aims to extract, summarize, and enable AI-powered interactions with PDF documents. Users can quickly query and retrieve relevant information from dense documents, improving accessibility and efficiency.
How it Works: 
Text Extraction: Custom scripts extract content from PDFs.
Embedding Models: Word embeddings capture semantic meaning for better searchability.
AI Chatbot: Integrates advanced language models Mistral to respond to user queries.
Optimization: Pretrained models were fine-tuned on a GPU server for accuracy and performance.
Technology Used: Text Processing: Custom PDF extraction scripts.
AI Model: Mistral, Optimization: Fine-tuned.
This project automates PDF content extraction and enhances user interaction through an AI chatbot, making document retrieval faster and more efficient.

Image Caption Generation

ObjectiveThis project aims to automatically generate captions for images, improving content creation, accessibility, and visual data organization.
How it Works: Vision Transformers analyze images and extract meaningful features. Python Imaging Library (PIL) processes images to support accurate caption generation. The AI model is trained to recognize patterns and generate contextually relevant captions.
Technology UsedAI Model: Vision Transformers (specialized for image recognition). Image Processing: Python Imaging Library (PIL)
This project enhances image-based content accessibility by generating accurate and meaningful captions, making visual data more searchable and useful.

Policy Declaration Extraction

Objective: This project aims to identify and extract relevant information from policy declaration Portable Document Formats. If a document is verified as a valid declaration page, the system extracts key text and generates custom standardized responses, which are compiled into guidance PDFs for further use.
How it WorksOCR-based validation checks if the PDF is a valid policy declaration. If valid, key details are extracted and formatted into structured responses. The processed text is used to generate guidance documents for further use.
Technology Used:OCR & Text Extraction: Tesseract, AWS Textract
AI Model: Llama3 - Fine Tuned
This system automates document validation and text extraction, ensuring efficient and accurate processing of policy declarations.

Invoice Extraction

ObjectiveThis project extracts check details and claim-related information from scanned PDFs, improving automated financial processing.
How it WorksOCR and AI models process scanned invoices to extract key financial details. The extracted information is stored and structured for further analysis.
Technology UsedProgramming Language: Python, MySQL, AI Model: Amazon Nova Pro, Textract
This system automates invoice data extraction, improving efficiency and accuracy in financial processing

Top Three Adjuster Prediction

Objective: This project aims to predict top-performing adjusters using machine learning algorithms. By analyzing key performance metrics, the system helps teams make data-driven decisions in selecting the best adjusters.
How it Works:
The model evaluates various performance metrics of adjusters.
Machine learning techniques analyze historical data to identify top-performing adjusters.
The insights assist in better resource allocation and decision-making.
Technology Used: Machine Learning Techniques for predictive modeling
This system enables smarter decision-making by identifying high-performing adjusters, optimizing efficiency, and improving overall workflow. 

AI Metrics Chatbot

ObjectiveThis project aims to provide quick access to claim metrics using a chatbot powered by conversational AI. Users can ask questions and receive instant insights from DB tables without manually searching through data.
How it WorksThe chatbot processes user queries related to claim metrics. It retrieves and presents relevant data-driven insights from the Claim Metrics table. Multiple AI models ensure accurate and context-aware responses.
Technology Used: Python, Python libraries, AI Models: Mistral
This system enhances accessibility to claim metrics, reducing manual effort and improving decision-making efficiency through conversational AI. 

Damage Detection Model

ObjectiveThis project aims to detect hail and wind damage in images using AI-powered object detection. The goal is to assist insurance stakeholders in assessing weather-related structural damage efficiently.
How it WorksImages of structures are analyzed for hail and wind damage. The YOLO V5 model detects and classifies damage patterns in real-time. Insights from the model help in faster and more accurate damage assessment.
Technology UsedAI Model: YOLO V5 (for real-time object detection)
This project successfully demonstrated the potential of AI in damage detection but is currently paused after the proof-of-concept phase, awaiting further development. 

AI Kickback Line Items

ObjectiveThis project aims to predict whether a line item needs to be kickbacked. If a kickback is required, the system automatically generates a corresponding note for further action.
How it WorksRule-based checks filter line items initially. If a line item fails the rules, it is analyzed by the Random Forest model to determine if a kickback is necessary. If a kickback is required, the T-5 model generates a detailed explanatory note.
Technology UsedProgramming Language: Python, MySQL, Libraries: pandas, NLP, matplotlib, scikit-learn, Hugging Face, numpy, PyTorch, AI Models: Random Forest (Kickback Prediction), T-5 (Note Generation)
This system automates the kickback identification process, ensuring efficient decision-making and reducing manual effort in handling line items.

Dashboards & Validation Reports
Objective: The Estimate Assist Dashboards were designed to provide stakeholders with interactive, data-driven insights into the performance of Estimate Assist, claim processing, Adjuster performance. These dashboards enable users to explore key trends, assess operational effectiveness, and measure cost savings, ultimately leading to better-informed business decisions. By consolidating various data points into visual reports, the dashboards help adjusters, analysts, and decision-makers identify patterns, optimize workflows, and validate predictions.

Dashboards :
  • Adjuster Metrics Dashboard: Provides insights into adjusters’ performance, efficiency, and case handling.
  • Estimate Assist Report: Offers a comprehensive view of estimates processed, including trends and performance metrics.
  • Savings Report: Displays cost-saving trends achieved through Estimate Assist, enabling financial evaluation.
Validation Pages:
In addition to dashboards, internal validation pages were developed to ensure the reliability and accuracy of predictions. These pages are used for model validation and quality assurance.
  • Adjuster Prediction Validation Page: Compares predicted vs. actual outcomes for adjusters, ensuring accuracy in estimations.
  • NotesQ Analysis Page: Analyzes qualitative notes for patterns and insights, enhancing decision-making and process improvements.
Technology Used: Pandas, NumPy, Matplotlib & Seaborn, Plotly, 
Python – Core programming language for handling data processing and visualization logic. SQL / NoSQL Databases and AWS Cloud Hosting.
The Estimate Assist Dashboards successfully provide a data-driven framework for monitoring, analyzing, and improving Estimate Assist’s performance. With real-time insights into adjuster efficiency, estimate trends, and cost savings, stakeholders can make informed, strategic decisions that optimize operations.

The inclusion of validation pages further strengthens the system by ensuring the accuracy and reliability of predictions, supporting continuous improvement in decision-making processes. The dashboards are live and fully functional, offering stakeholders an intuitive and efficient way to explore their data.


Github Projects

I actively contribute to open-source AI and machine learning projects on GitHub, sharing innovations in GenAI, NLP, LLMOps, ML, DL and predictive analytics. My work focuses on building scalable AI models, optimizing ML workflows, and developing AI-powered automation tools. By collaborating with the global developer community, I aim to push the boundaries of AI research and practical implementations.

This repository features implementations like multi-agent orchestration using LangChain, retrieval-augmented generation (RAG), and database query optimization. I integrate LangGraph for fallback mechanisms and Streamlit for interactive AI applications, making AI solutions more accessible.

In this repository, NLTKworkings, I have implemented various NLP techniques using the Natural Language Toolkit (NLTK). The repository includes Python scripts demonstrating:​
  • Stemming: Reducing words to their root forms.​
  • Regular Expression Tokenization: Splitting text into tokens based on patterns.​
  • Word Tokenization: Dividing text into individual words.​
  • Stop Word Extraction: Removing common words that may not be meaningful for analysis.​
  • Sentiment Analysis: Determining the sentiment expressed in text.​
  • Text-to-Speech Conversion: Converting text into speech using Google's gTTS library.

In this repository, implementations of various classification methods, such as Naive Bayes, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Logistic Regression. These scripts demonstrate practical applications on datasets like Titanic survival predictions and diabetes diagnosis.

This repository provides an in-depth overview of MongoDB, covering its document-oriented structure, schema-less design, scalability, query language, and indexing capabilities. Additionally, it includes practical examples, such as the mongoDBPython.ipynb notebook, which demonstrates how to interact with MongoDB using Python.
One more repo that had CRUD operations 

In my repository, pandas, I have shared a Jupyter Notebook that demonstrates various data manipulation techniques using the SalesInfo.csv dataset. This notebook showcases practical applications of pandas methods for data cleaning, transformation, and analysis.

NumPy

Repository: numpyThis repository contains a Jupyter Notebook (numpy.ipynb) that provides comprehensive examples of why and how to use NumPy. It covers various functions essential for data scientists, including array creation, manipulation, and operations. The repository also includes text files (planets.txt, planets_new.txt, planets_small.txt) used within the notebook for practical demonstrations.​

Repository: Guvi-Numpy_tasksThis repository hosts solutions to NumPy tasks assigned by Guvi. The Jupyter Notebook (Guvi_Numpy_tasks_ipyn.ipynb) showcases practical applications of NumPy, reinforcing concepts such as array operations, indexing, and broadcasting.

Places of Interest - LLM
This repository showcases an application that leverages Large Language Models (LLMs), Chainlit, LangChain, and OpenAI technologies to provide information about various places of interest. By integrating these advanced tools, the application offers users detailed insights into locations, enhancing their exploration experience.


Match-Case in Python
GitHub Repository: Match-Case.ipynb
Medium Article: Match-Case Python
This project explores Python's match-case statement, a powerful pattern-matching feature introduced in Python 3.10. The repository includes a Jupyter Notebook demonstrating real-world applications of match-case, while the Medium article provides an in-depth explanation with examples.

In my repository, Regression---linear, I have shared various Python scripts that demonstrate the application of linear regression models. These scripts include examples such as simpleRegWithCalc.py and multipleRegWithCalc.py, which showcase simple and multiple linear regression analyses, respectively.

 In this repository, Regression---Multiple-linear, I have shared various Python scripts that demonstrate the application of multiple linear regression models. These scripts include examples such as multipleRegWithCalc.py, which showcases multiple linear regression analysis, and SVM_svr_Regression.py, which demonstrates Support Vector Regression (SVR) using Support Vector Machines (SVMs). 

In this repository, Regression---L1-LASSO---L2-Ridge, I have shared Python scripts that demonstrate the application of Lasso (L1) and Ridge (L2) regression models. These scripts include examples such as co2MultipleRegression.py and mileage_prediction.py, which showcase the use of these regularization techniques in multiple regression analyses.

In my repository, regression_ML_Cars, I have implemented a linear regression model to analyze the Auto MPG dataset, which contains information about various car models, including their fuel efficiency (miles per gallon), engine displacement, horsepower, weight, acceleration, model year, origin, and car name. This project demonstrates how to build and evaluate regression models to predict a car's fuel efficiency based on its characteristics, providing insights into factors influencing vehicle performance.

I have developed a K-Nearest Neighbors (KNN) classifier to predict thyroid conditions based on medical datasets. The project includes a Jupyter Notebook that demonstrates data preprocessing, model training, and evaluation processes.

I have implemented a Logistic Regression model to classify data from the Statlog (Shuttle) dataset. This dataset comprises nine numerical attributes, with approximately 80% of instances belonging to class 1. ​
To enhance model performance, I performed outlier detection and removal and addressed class imbalance using techniques like oversampling. These preprocessing steps significantly improved the model's accuracy, achieving a classification accuracy of 99.99%. This project demonstrates the effectiveness of data preprocessing and logistic regression in handling imbalanced datasets.

In my repository, Breast_Cancer_Prediction, I have developed a model to predict breast cancer using ensemble learning techniques. The project utilizes the VotingRegressor function to combine multiple regression models, including Decision Trees, K-Nearest Neighbors (KNeighborsRegressor), and Support Vector Regression (SVR), aiming to enhance predictive accuracy. The repository includes a Jupyter Notebook that details the data preprocessing, model training, and evaluation processes, as well as the dataset (cancer.csv) used for training.

This model predicts income levels based on 14 demographic and employment-related features. The project includes a Jupyter Notebook (predict_income_LogisticRegression.ipynb) that details the data preprocessing, feature selection, model training, and evaluation processes.

Data Visualization 
Description: This project utilizes Plotly Express to create a line polar chart that visualizes hotel ratings. The chart compares ratings across various categories for three selected hotels out of a dataset of 18. The current version statically displays these comparisons, with plans to incorporate interactive features allowing users to select different hotels for comparison in future updates.​
Description: This project employs Streamlit to visualize data on bird-window collisions. It provides interactive graphs that display patterns and statistics related to bird collisions with windows, aiming to raise awareness and facilitate further analysis of this issue.​
Description: This repository contains various examples of Streamlit applications, each demonstrating different data visualization techniques using Plotly Express. The provided data files and scripts cover topics such as COVID-19 data visualization, social capital analysis, and more, serving as practical examples for building interactive web applications with Streamlit.
Description: This repository contains various examples of interactive web applications built using the Dash framework. Dash, developed by Plotly, enables the creation of analytical web applications with Python. The repository includes scripts such as basrchart.py, scatterplot.py, and lineGraph.py, demonstrating the development of interactive bar charts, scatter plots, and line graphs, respectively. Additionally, it provides datasets like intro_bees.csv and Caste.csv used within these applications.​
Description: This repository offers detailed examples of various plots using the Seaborn library, a Python data visualization library based on Matplotlib. The Jupyter Notebook seaborn_visuals.ipynb showcases the creation of different statistical graphics, including distribution plots, categorical plots, and matrix plots. The repository also includes the dataset Bengaluru_House_Data.csv, utilized in the visualizations.​
Repository: matplotlib
Description: This repository provides numerous examples of data visualization using the Matplotlib library. The Jupyter Notebook Data_Visualization_MatplotLib_.ipynb explores various chart types, such as line plots, bar charts, histograms, and scatter plots. It also includes datasets like EURINR.csv, currency.csv, and tips.csv, which are employed in the visualizations.
I have implemented various methods for data cleaning and outlier detection. The repository includes scripts that demonstrate techniques such as Z-Score calculation, Interquartile Range (IQR) analysis, and Binary Logistic Regression Encoding (bLore). Additionally, it contains a dataset (Bengaluru_House_Data.csv) used for practical demonstrations of these methods.
Python
Description: This repository contains examples and explanations of Python lists, including their creation, manipulation, and common operations such as appending, removing, and slicing elements.​
Description: This repository provides insights into Python sets, covering topics like set creation, adding and removing elements, and performing set operations such as unions, intersections, and differences.​
Description: This repository explores Python tuples, discussing their characteristics, advantages over lists, and use cases. It includes examples of tuple creation, indexing, and immutability.​
Description: This repository delves into Python dictionaries, explaining how to create dictionaries, access and modify elements, and utilize dictionary methods. It also covers the mutability of dictionaries and their applications.​
Description: This repository focuses on the binary search algorithm, providing both iterative and recursive implementations. It explains the algorithm's logic, use cases, and the importance of having a sorted array for efficient searching.​
Description: This repository offers insights into fundamental data structures such as stacks, heaps, and binary trees. It includes explanations of each structure, their operations, and practical examples of their implementation and usage.​
Description: This repository contains code snippets and examples demonstrating various string operations in Python. It covers topics like string manipulation, formatting, and common functions used in string processing.

Partner With Us To Build a Resilient Business

You can edit text on your website by double clicking