Kunal Patil

GitHub Demo

Project Overview: Developed a Streamlit web app for uploading PDFs and querying their content using Retrieval-Augmented Generation (RAG).
PDF Handling: Used PyPDFLoader and RecursiveCharacterTextSplitter to process and split PDFs into chunks.
AI Integration: Leveraged ChatGroq with selectable models (gemma2-9b-it, llama-3.1-8b-instant) for generating responses.
Retrieval: Employed HuggingFaceEmbeddings and Chroma for vector-based document retrieval.
Chat Features: Implemented session-based chat history with ChatMessageHistory for context-aware conversations.
UI: Designed an intuitive interface with PDF uploads, model selection, and query input.

GitHub Demo

Project Overview: Developed a salary prediction tool for data science roles using Glassdoor data, focusing on how skills like Python, Excel, AWS, and Spark impact salaries.
Data Cleaning: Parsed salary data, extracted company ratings, and engineered features for skills, job location, company age, and job description length.
Exploratory Analysis: Conducted exploratory data analysis (EDA), revealing correlations between job description length and company age, and visualizing job role distributions across sectors.
Model Building: Evaluated Lasso Regression and RandomForestRegressor, optimized both using RandomSearchCV, and selected RandomForest for its superior performance (MAE: 13.69).
Deployment: Deployed the final model on Streamlit, creating an accessible tool for salary estimation.
Tools Used: Leveraged pandas, numpy, sklearn, matplotlib, seaborn, and pickle for data processing, modeling, and deployment.

GitHub

Project Overview: Developed a predictive model for customer churn using an imbalanced dataset (4000 "no" vs. 521 "yes") targeting subscription to a term deposit.
Data Insights: Conducted exploratory analysis, revealing outliers in balance and duration, job type as a key predictor, and a low 17% defaulter rate, with further details in the notebook.
Data Preprocessing: Transformed binary features (yes/no) to 1/0, encoded categorical variables (e.g., job, education) with OneHotEncoder, and scaled numeric features using StandardScaler.
Model Selection: Evaluated multiple models and selected GradientBoostingClassifier, training it on both imbalanced and balanced datasets via BalancedBaggingClassifier.
Performance Outcomes: First model accurately identified churners (non-subscribers); second model, with balanced data, effectively detected subscribers despite their minority status.
Business Value: Provided two tailored models, allowing businesses to prioritize either churn detection or subscriber identification based on strategic needs.

GitHub

Project Overview: Developed Jupyter notebooks for time series analysis, focusing on stock prices and trading volumes of companies such as Google (GOOGL), Tesla (TSLA), Ford, and GM using Python.
Data Retrieval: Utilized pandas_datareader to retrieve historical stock data, enabling detailed analysis of prices, volumes, and market trends.
Exploratory Analysis: Conducted initial data exploration, visualized stock price trends, and examined trading volumes to highlight significant trading days.
Technical Analysis: Implemented moving averages (simple and exponential), calculated market capitalization, and used rolling and expanding windows to study price trends and fluctuations.
Correlation & Volatility: Explored correlations between stock prices with scatter matrices, computed daily percentage changes, and assessed volatility using histograms and KDE plots.
Advanced Insights: Created box plots to compare stock returns, calculated cumulative returns, and performed time period-specific analyses to reveal trends during key intervals.

GitHub

Project Overview: Developed a Simple Recurrent Neural Network (RNN) to classify IMDB movie reviews as positive or negative based on their text content.
Core Technologies: Utilized TensorFlow for model development and Streamlit for creating an interactive web application.
Data Preprocessing: Prepared the IMDB dataset using tokenization to break text into words and padding to standardize input lengths.
Model Development: Designed and trained a Simple RNN to capture sequential patterns in text for accurate sentiment classification.
NLP Exploration: Investigated word embeddings to understand text representation within the model.
Interactive Deployment: Deployed the model via a Streamlit app, enabling users to input reviews and receive real-time sentiment predictions.

GitHub

Project Overview: Developed a content-based movie recommendation system utilizing movie data from the TMDB API and natural language processing (NLP) techniques to generate personalized recommendations.
Data Collection: Collected movie metadata, such as overviews, cast, and genre details, directly from the TMDB API to serve as the foundation for the recommendation system.
Data Preprocessing: Processed and cleaned the raw data by removing spaces and applying stemming to standardize text, ensuring consistency for analysis.
Feature Extraction: Employed NLP methods to extract key features from the movie data, creating unique, meaningful representations for each movie.
Similarity Calculation: Used cosine similarity to measure the relationships between movies based on their extracted features, enabling accurate similarity comparisons.
Recommendation Functionality: Built a recommendation function that takes a movie title as input and returns a list of the most similar movies based on calculated similarity scores.

About Me