Hi, I'm
Dmitry Beresnev
ML Engineer & Data Scientist
About Me
I'm a Master's student in Computer Science at Innopolis University, specializing in AI and Data Science with a deep research focus on ML Optimization, LLMs and Deep Learning.
My expertise spans developing and implementing novel ML models and functional pipelines using PyTorch. I have hands-on experience with classical DRL algorithms including DQN, A2C, and REINFORCE.
I also have experience in leading projects from conception to deployment, including development an AI-featured EdTech platform used by educational organizations.
Currently seeking challenging and interesting R&D positions where I can contribute to state-of-the-art ML and AI solutions.
Research Focus
Deep Learning, LLMs and Optimization
Technical Skills
PyTorch, PuLP, Scikit-learn, TRL and modern ML frameworks
Education
Master's student in AI & Data Science
Projects
Full-stack ML systems from research to production
Languages
Education
MSc in Computer Science
Innopolis University
Innopolis, Russia
Field of study: AI & Data Science
Thesis
[in progress] New and Efficient Facet-Based Identification methods for Rank-Deficient Simplex-Structured Matrix Factorization
Supervisor: Valentin Leplat
Co-supervisors:
- ▸ Nicolas Gillis
Relevant Coursework
BSc in Computer Science
Innopolis University
Innopolis, Russia
Field of study: AI & Data Science
Thesis
Text plagiarism detection in the field of large language models using the reinforcement learning
Supervisor: Armen Beklaryan
Relevant Coursework
Research Experience
Huawei: Wireless Data Transmission
Researcher, ML Engineer
ISP RAS & Innopolis University
Designing and simulating Deep AI models for wireless distribution of devices to base stations under time and resource constraints for Huawei
Supervisor: Aleksandr Beznosikov
Responsibilities
- ▸ Development and implementation of models on PyTorch
- ▸ Creation and expansion of the training-testing pipeline
- ▸ Conducting experiments
Concepts
Tech Stack
Diligent Learning: Prospects and Applications
Researcher, ML Engineer
MSU AI Center
Implementing and testing a Diligent Learning: a novel approach for fine-tunning the LLMs for reasoning problems. Based on paper 'From Reasoning to Super-Intelligence: A Search-Theoretic Perspective' by Shai Shalev-Shwartz and Amnon Shashua
Supervisor: Petr Anokhin
Responsibilities
- ▸ Development and implementation of diligent learning pipeline
- ▸ Fine-tunning LLMs in new paradigm
- ▸ Conducting experiments
Concepts
Tech Stack
Resources
New and Efficient Facet-Based Identification methods for Rank-Deficient Simplex-Structured Matrix Factorization
Researcher
Innopolis University
Master thesis research on new methods of facet identifications for SSMF in order to improve existing GFPI algorithm
Supervisor: Valentin Leplat
Co-supervisors:
- ▸ Nicolas Gillis
Responsibilities
- ▸ Development and implementation of new polytope facet identification approaches
- ▸ Reviewing SOTA SSMF methods and facet identification methods
- ▸ Conducting experiments
Concepts
Tech Stack
Applied AlphaEvolve: CAD Reconstruction and Combinatorial Geometry
Researcher
Skoltech Summer School of Machine Learning (SMILES-2025)
Research project applying OpenEvolve (open-source AlphaEvolve) to CAD reconstruction from text descriptions and combinatorial geometry problems using LLM-driven evolutionary search
Supervisor: Petr Anokhin
Achievements
- ▸ Achieved optimal ball partition results matching theoretical bounds in dimensions 2-13
- ▸ Outperformed zero-shot LLM baselines across multiple complex 3D shapes
- ▸ Established comprehensive benchmark pipeline for CAD reconstruction with 7 evaluation metrics
Responsibilities
- ▸ Implemented OpenEvolve framework for CAD reconstruction task
- ▸ Designed evaluation metrics including IoU, Chamfer Distance, and Hausdorff Distance
- ▸ Analyzed evolutionary pathways for structural and parametric error correction
Concepts
Tech Stack
Resources
Text Plagiarism Detection in the filed of LLMs Using RL
Researcher
Innopolis University
Bachelor thesis research on novel approach for text plagiarism detection using Deep Reinforcement Learning
Supervisor: Armen Beklaryan
Achievements
- ▸ Best MSE of 0.108 on synthetic dataset
- ▸ Proposed three architectures based on DQN, A2C, and REINFORCE
- ▸ Best results achieved by REINFORCE model
Responsibilities
- ▸ Designed novel DRL-based approach
- ▸ Implemented and tested multiple architectures
- ▸ Conducted and analyzed comprehensive experiments
Concepts
Tech Stack
Resources
Work Experience
ML Developer
Innopolis CIPR
Design and implementation of RAG pipeline over proprietary Angular frontend repositories
Achievements
- ▸ Approved quality of RAG pipeline on gold queries provided by experts
Responsibilities
- ▸ Building indexers: Inverse Index, BallTree with model-generated embeddings and partially Faiss
- ▸ Connecting local generative models
- ▸ Designing pipeline of scrapping, embeddings generation, indexing and retrieving
Concepts
Tech Stack
ML Engineer
Gazprom CPS
Design and training predictive ML model to identify causes of defects in construction facilities
Achievements
- ▸ Achieving 80% accuracy on proprietary dataset
Responsibilities
- ▸ Data preprocessing and feature engineering
- ▸ Model building and validation
- ▸ Full working pipeline development
Concepts
Tech Stack
ML Developer
Advanced Engineering School IU
Development of code generation model using transformer-based architecture
Achievements
- ▸ Significant contribution to research
Responsibilities
- ▸ Fine-tuning Gorilla model on proprietary dataset
Concepts
Tech Stack
Resources
Teaching Experience
Teaching Assistant
Innopolis University
Teaching assistant for Introduction to Optimization course for 2nd year bachelor students
Responsibilities
- ▸ Conducting tests and laboratory work
Concepts
Tech Stack
Teaching Assistant
Yandex Student Camp on Math in AI
Teaching assistant in student camp for Optimization Methods in Machine Learning course
Supervisor: Alexander Beznosikov
Responsibilities
- ▸ Design and implementation of materials for seminars and homeworks
Concepts
Tech Stack
Resources
Projects
Accept School
2023 — PresentFounder, CEO; previously — Leader Developer
A comprehensive EdTech platform that combines machine learning with modern web technologies to provide an interactive learning experience for programming students
✨ Achievements
- ▸ Currently utilized in educational organizations
- ▸ Approximately 200 active users
Responsibilities
- ▸ Led full-stack solution design
- ▸ Defined development and operational processes
- ▸ Developed code plagiarism detection system using ML
- ▸ Implemented generative AI for hint suggestions, text and images generation using open-source LLMs
- ▸ Engineered backend with FastAPI and MongoDB
- ▸ Built frontend with Next.js
Concepts
Tech Stack
Resources
DoWell
2025ML Developer, Tech Leader
An intelligent conversational system that uses Retrieval-Augmented Generation (RAG) to simulate expert consultations across various professional domains
Responsibilities
- ▸ Designed and implemented RAG architecture for domain-specific responses
- ▸ Deployed and connected generative models
- ▸ Engineered backend using FastAPI
Concepts
Tech Stack
Resources
EBREG-RL: Example-Based Regular Expression Generation via Reinforcement Learning
2025Developer
A reinforcement learning system for automatic regular expression generation from labeled examples. The project formulates regex generation as a Markov Decision Process using Reverse Polish Notation to handle operator precedence
✨ Achievements
- ▸ Successfully generated optimal regex patterns for number and word extraction tasks
- ▸ Implemented novel reward functions combining F1 score, accuracy metrics, and length penalties
Responsibilities
- ▸ Formulated regex generation as MDP with 104-action space using RPN tokens
- ▸ Designed custom reward functions balancing pattern accuracy and expression complexity
- ▸ Implemented and compared REINFORCE and A2C algorithms
Concepts
Tech Stack
Resources
PyFinder: fast search through Python documentation
2025Developer
An Information Retrieval system providing fast search through Python's built-in documentation. The platform combines traditional inverted indexing with modern LLM-powered semantic search and RAG for natural language query processing, featuring content moderation and spell correction capabilities
✨ Achievements
- ▸ Performance with LLM embeddings + Ball Tree indexer: F1@1=0.53, nDCG@1=0.83
Responsibilities
- ▸ Implemented semantic search using sentence-transformers embeddings and Ball Tree spatial indexing
- ▸ Built RAG pipeline with prompt engineering, context retrieval, and source tracking
- ▸ Developed Norvig spell corrector with frequency-based language model
- ▸ Evaluated using comprehensive metrics: LLM-specific and ranking metrics
- ▸ Designed FastAPI backend and Next.js frontend with dual search/chat modes
Concepts
Tech Stack
Resources
Detecting AI-generated Python code via ML
2025Developer
A ML system for detecting AI-generated Python code in programming competitions. The project compares two approaches: transformer-based models (CodeBERT, DeBERTa) for deep semantic analysis and AST-based lightweight models (Random Forest, Decision Trees, MLP) for efficient structural pattern recognition
✨ Achievements
- ▸ Achieved 95.9% accuracy with CodeBERT model on synthetic dataset
- ▸ Developed efficient AST-based Random Forest achieving 83.5% accuracy with 2ms inference time
Responsibilities
- ▸ Engineered dataset generation pipeline using 4 LLMs (Evil, Llama-3.2-3b, BLACKBOX.AI, DeepSeek) with specialized prompts
- ▸ Fine-tuned DeBERTa-v3 and CodeBERT models
- ▸ Implemented AST-based feature extraction using Tree-sitter library for structural code analysis
- ▸ Integrated LIME explainability framework for model interpretation
- ▸ Evaluated models across 6 metrics: F1 Score, ROC/AUC, Precision, Recall, Accuracy, and inference time
Concepts
Tech Stack
Resources
RecSys via Approximate Matrix Factorization
2024Developer
A RecSys built on approximate matrix factorization techniques for the synthetic dataset. The project explores multiple optimization approaches, such as gradient-based methods with various step-size strategies (Armijo, Wolfe conditions, Lipschitz estimation), advanced optimizers (Adam, RMSprop, AdaGrad, Heavy Ball, Nesterov), and vector-wise updates to solve the collaborative filtering problem
✨ Achievements
- ▸ Implemented 12+ optimization algorithms with 7 step-size selection strategies
- ▸ Compared full-matrix vs. row-wise/column-wise (Vector GD) update strategies
Responsibilities
- ▸ Formulated recommendation as matrix factorization problem
- ▸ Implemented 6 advanced optimizers: Adaptive GD, Heavy Ball, Nesterov momentum, AdaGrad, RMSprop, Adam, BFGS
- ▸ Experimented with Non-Negative Matrix Factorization using multiplicative updates
- ▸ Trained a neural network baseline with genre/demographic features
Concepts
Tech Stack
Resources
Accept School
2023 — PresentFounder, CEO; previously — Leader Developer
A comprehensive EdTech platform that combines machine learning with modern web technologies to provide an interactive learning experience for programming students
✨ Achievements
- ▸ Currently utilized in educational organizations
- ▸ Approximately 200 active users
Responsibilities
- ▸ Led full-stack solution design
- ▸ Defined development and operational processes
- ▸ Developed code plagiarism detection system using ML
- ▸ Implemented generative AI for hint suggestions, text and images generation using open-source LLMs
- ▸ Engineered backend with FastAPI and MongoDB
- ▸ Built frontend with Next.js
Concepts
Tech Stack
Resources
EBREG-RL: Example-Based Regular Expression Generation via Reinforcement Learning
2025Developer
A reinforcement learning system for automatic regular expression generation from labeled examples. The project formulates regex generation as a Markov Decision Process using Reverse Polish Notation to handle operator precedence
✨ Achievements
- ▸ Successfully generated optimal regex patterns for number and word extraction tasks
- ▸ Implemented novel reward functions combining F1 score, accuracy metrics, and length penalties
Responsibilities
- ▸ Formulated regex generation as MDP with 104-action space using RPN tokens
- ▸ Designed custom reward functions balancing pattern accuracy and expression complexity
- ▸ Implemented and compared REINFORCE and A2C algorithms
Concepts
Tech Stack
Resources
Detecting AI-generated Python code via ML
2025Developer
A ML system for detecting AI-generated Python code in programming competitions. The project compares two approaches: transformer-based models (CodeBERT, DeBERTa) for deep semantic analysis and AST-based lightweight models (Random Forest, Decision Trees, MLP) for efficient structural pattern recognition
✨ Achievements
- ▸ Achieved 95.9% accuracy with CodeBERT model on synthetic dataset
- ▸ Developed efficient AST-based Random Forest achieving 83.5% accuracy with 2ms inference time
Responsibilities
- ▸ Engineered dataset generation pipeline using 4 LLMs (Evil, Llama-3.2-3b, BLACKBOX.AI, DeepSeek) with specialized prompts
- ▸ Fine-tuned DeBERTa-v3 and CodeBERT models
- ▸ Implemented AST-based feature extraction using Tree-sitter library for structural code analysis
- ▸ Integrated LIME explainability framework for model interpretation
- ▸ Evaluated models across 6 metrics: F1 Score, ROC/AUC, Precision, Recall, Accuracy, and inference time
Concepts
Tech Stack
Resources
DoWell
2025ML Developer, Tech Leader
An intelligent conversational system that uses Retrieval-Augmented Generation (RAG) to simulate expert consultations across various professional domains
Responsibilities
- ▸ Designed and implemented RAG architecture for domain-specific responses
- ▸ Deployed and connected generative models
- ▸ Engineered backend using FastAPI
Concepts
Tech Stack
Resources
PyFinder: fast search through Python documentation
2025Developer
An Information Retrieval system providing fast search through Python's built-in documentation. The platform combines traditional inverted indexing with modern LLM-powered semantic search and RAG for natural language query processing, featuring content moderation and spell correction capabilities
✨ Achievements
- ▸ Performance with LLM embeddings + Ball Tree indexer: F1@1=0.53, nDCG@1=0.83
Responsibilities
- ▸ Implemented semantic search using sentence-transformers embeddings and Ball Tree spatial indexing
- ▸ Built RAG pipeline with prompt engineering, context retrieval, and source tracking
- ▸ Developed Norvig spell corrector with frequency-based language model
- ▸ Evaluated using comprehensive metrics: LLM-specific and ranking metrics
- ▸ Designed FastAPI backend and Next.js frontend with dual search/chat modes
Concepts
Tech Stack
Resources
RecSys via Approximate Matrix Factorization
2024Developer
A RecSys built on approximate matrix factorization techniques for the synthetic dataset. The project explores multiple optimization approaches, such as gradient-based methods with various step-size strategies (Armijo, Wolfe conditions, Lipschitz estimation), advanced optimizers (Adam, RMSprop, AdaGrad, Heavy Ball, Nesterov), and vector-wise updates to solve the collaborative filtering problem
✨ Achievements
- ▸ Implemented 12+ optimization algorithms with 7 step-size selection strategies
- ▸ Compared full-matrix vs. row-wise/column-wise (Vector GD) update strategies
Responsibilities
- ▸ Formulated recommendation as matrix factorization problem
- ▸ Implemented 6 advanced optimizers: Adaptive GD, Heavy Ball, Nesterov momentum, AdaGrad, RMSprop, Adam, BFGS
- ▸ Experimented with Non-Negative Matrix Factorization using multiplicative updates
- ▸ Trained a neural network baseline with genre/demographic features
Concepts
Tech Stack
Resources
Collaborations
A collaborative team of developers revolutionizing educational technology through innovative EdTech solutions. The organization combines machine learning with modern web technologies to create accessible and functional learning experiences for students and educators. The main current project — Accept educational platform
Featured projects
Accept School
A comprehensive EdTech platform combining ML with modern web technologies for interactive programming education, featuring code plagiarism detection, generative AI for hints, and automated assessment systems
Accept Documentation
Rich documentation of Accept platform for educators and students, also containing the AI-features usage examples
Crogs Foundation
Founder, Research Collaborator
A community of enthusiastic researchers and developers dedicated to advancing the frontiers of technology through curiosity-driven research and practical applications. The foundation bridges cutting-edge research with user-centric implementations, focusing on AI-powered code evolution and intelligent automation systems.
Featured projects
DoWell
An intelligent conversational system that uses Retrieval-Augmented Generation (RAG) to simulate expert consultations across various professional domains
Research project applying OpenEvolve (open-source AlphaEvolve) to CAD reconstruction from text descriptions and combinatorial geometry problems using LLM-driven evolutionary search
Get In Touch
I'm currently seeking interesting R&D opportunities in Machine Learning and AI. Feel free to reach out!