Martin Lab
Martin Lab
  • Welcome to Martin Lab
  • Student Theses
  • Lectures
    • NLP & GenAI
      • 1. Introduction to NLP & Generative AI
      • 2. Specialised models vs. LLMs for NLP tasks
      • 3. LLM Selection, Tooling & Monitoring
      • 4. Transformer Architecture, Prompt Engineering & Monitoring
  • Exercises
    • 2024-09-12
Powered by GitBook
On this page
  • RepoChat Related Topics
  • Persona-Driven Prompt Transformation
  • Persona- and Knowledge-Driven Synthetic Training Data Generation
  • Semantic Verification of Large Language Model Output
  • LLM-based Knowledge Graph Construction
  • Vocabulary-Based Embedding Model Fine-Tuning for RAG
  • Agentic Verification of Large Language Model Output
  • HIVBOT Related Topics
  • Fine-Tuning and Aligning LLMs for Trustworthiness and Safety in Healthcare
  • Training LLMs and Embedding Models on Low-Resource Languages for Healthcare
  • Sustainability Over AI Hype in Generative versus Traditional Models

Student Theses

These master thesis topics are currently open and available for students pursuing a degree in BIS and, in some cases, for students in the Master of Medical Informatics programs at FHNW.

Last updated 7 months ago

RepoChat Related Topics

These thesis topics are part of the RepoChat Project (Semantic Verification in Large Language Model-based Retrieval Augmented Generation), funded by Innosuisse under grant 109.093 IP-ICT. The project aims to enhance the accuracy and relevance of AI-generated responses by improving semantic verification in large language models.

  • Martin, A., Witschel, H. F., Mandl, M., & Stockhecke, M. (2024). Semantic Verification in Large Language Model-based Retrieval Augmented Generation. Proceedings of the AAAI Symposium Series, 3(1), Article 1.

  • Martin, A., Witschel, H. F., Mandl, M., & Stockhecke, M. (2024, März 26). Semantic Verification in Large Language Model-based Retrieval Augmented Generation. AAAI Spring Symposium on Empowering Machine Learning and Large Language Models with Domain and Commonsense Knowledge (AAAI-MAKE), Stanford University, California, USA. Zenodo.

Persona-Driven Prompt Transformation

  • Teaser: Exploring the transformation of prompt templates by rewriting them to align with predefined personas, enhancing the user interaction experience. Develop an approach to identify the most fitting persona based on users' utterances.

  • Business Impact: Enhancing user engagement and satisfaction by delivering more personalized and relevant interactions, which can lead to increased customer loyalty and improved service quality.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Dialogue Systems, Personalization

  • Subject Description: This thesis focuses on transforming prompt templates to align with predefined personas, aiming to enhance the overall user interaction experience. By developing an approach to identify the most appropriate persona based on users' utterances, the project seeks to create more tailored and effective communication. This work is part of the RepoChat Project, which aims to improve semantic verification in large language model-based retrieval-augmented generation systems. The ultimate goal is to enhance the precision and relevance of responses provided by dialogue systems, thereby improving user satisfaction and engagement.

Persona- and Knowledge-Driven Synthetic Training Data Generation

  • Teaser: Generate synthetic data for fine-tuning LLMs and embedding models using predefined personas, following the methodology outlined by Chan et al. (2024). Enrich synthetic data by incorporating knowledge graphs (RDFS or Neo4j).

  • Business Impact: Enhancing the performance and accuracy of LLMs and embedding models through enriched synthetic training data, leading to improved AI-driven applications and services.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Knowledge Graphs, Data Generation

  • Subject Description: This thesis aims to generate synthetic data for fine-tuning large language models (LLMs) and embedding models by leveraging predefined personas. Following the methodology outlined by Chan et al. (2024), the project focuses on creating data that is relevant and tailored to specific user personas. Additionally, the incorporation of knowledge graphs (RDFS or Neo4j) is considered to further enrich the synthetic data, providing a more comprehensive and contextually accurate training dataset. This work is part of the RepoChat Project, which seeks to enhance semantic verification in large language model-based retrieval-augmented generation systems, ultimately improving the precision and relevance of AI-driven responses.

    • Chan, X., Wang, X., Yu, D., Mi, H., & Yu, D. (2024). Scaling Synthetic Data Creation with 1,000,000,000 Personas (arXiv:2406.20094). arXiv.

Semantic Verification of Large Language Model Output

  • Teaser: Utilize a knowledge graph for the semantic verification of LLM outputs in an agentic pattern style, ensuring accurate and meaningful responses. Formalize the knowledge graph in RDFS or proprietary technologies like Neo4j.

  • Business Impact: Improve the accuracy and reliability of LLM outputs, enhancing the quality of AI-driven services and user trust in automated systems.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Knowledge Graphs, Semantic Verification

  • Subject Description: This thesis focuses on employing a knowledge graph for the semantic verification of outputs generated by large language models (LLMs). By using an agentic pattern style, the aim is to ensure that responses are accurate, contextually appropriate, and meaningful. The knowledge graph can be formalized using RDFS or proprietary technologies such as Neo4j to facilitate robust and scalable verification processes. This research is part of the RepoChat Project, which is dedicated to improving semantic verification in large language model-based retrieval-augmented generation systems. The ultimate goal is to enhance the reliability and trustworthiness of AI-generated responses, thereby improving user satisfaction and engagement with AI-driven applications.

LLM-based Knowledge Graph Construction

  • Teaser: Leverage LLMs, potentially in an agentic pattern, to construct knowledge graphs (RDFS or Neo4j) economically and efficiently, minimizing human involvement. Focus on automating the creation process to enhance scalability and accuracy.

  • Business Impact: Streamline the creation of knowledge graphs, reducing costs and increasing efficiency, which can significantly enhance data management and AI-driven insights.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Knowledge Graphs, Automation

  • Subject Description: This thesis aims to leverage large language models (LLMs) to construct knowledge graphs in an economical and efficient manner, with minimal human intervention. By potentially employing an agentic pattern, the focus is on automating the knowledge graph creation process, using technologies such as RDFS or Neo4j. This approach aims to enhance the scalability and accuracy of knowledge graphs, making them more accessible and reliable for various applications. As part of the RepoChat Project, this research seeks to improve semantic verification in large language model-based retrieval-augmented generation systems, ultimately enhancing the quality and precision of AI-driven responses. The automation of knowledge graph construction can lead to significant cost savings and improved data management capabilities for businesses.

Vocabulary-Based Embedding Model Fine-Tuning for RAG

  • Teaser: Fine-tune an embedding model using an existing vocabulary to enhance Retrieval-Augmented Generation (RAG) performance. Focus on leveraging specialized vocabularies to improve the accuracy and relevance of the generated content.

  • Business Impact: Improve the precision and contextual relevance of AI-generated content, leading to better user experiences and more effective information retrieval.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Embedding Models, Information Retrieval

  • Subject Description: This thesis focuses on fine-tuning an embedding model using an existing vocabulary to boost the performance of Retrieval-Augmented Generation (RAG) systems. By leveraging specialized vocabularies, the project aims to enhance the accuracy and contextual relevance of the generated content. This approach is expected to significantly improve the quality of AI-driven responses in various applications. Part of the RepoChat Project, this research contributes to the goal of enhancing semantic verification in large language model-based retrieval-augmented generation systems. By refining embedding models with targeted vocabulary, the thesis seeks to optimize the efficiency and effectiveness of information retrieval processes.

Agentic Verification of Large Language Model Output

  • Teaser: Perform semantic verification of LLM outputs using an agentic pattern style, where each LLM agent plays distinct system roles. Enhance the verification process by leveraging multi-agent system patterns for robust and context-aware validation.

  • Business Impact: Increase the reliability and contextual accuracy of AI outputs, leading to more trustworthy and effective AI-driven applications and services.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Multi-Agent Systems, Semantic Verification

  • Subject Description: This thesis explores the use of an agentic pattern style for the semantic verification of large language model (LLM) outputs, with each LLM agent assigned distinct system roles. By leveraging multi-agent system patterns, the project aims to enhance the robustness and context-awareness of the verification process. This method is designed to ensure that AI-generated responses are accurate, relevant, and contextually appropriate. As part of the RepoChat Project, this research contributes to improving semantic verification in large language model-based retrieval-augmented generation systems. The ultimate goal is to increase the reliability and trustworthiness of AI outputs, thereby enhancing user satisfaction and confidence in AI-driven solutions.

HIVBOT Related Topics

These thesis topics are part of the international SNSF-funded HIVBOT Project (Researching Intelligent Chatbots as Healthcare Coaches), which aims to support people living with HIV. The project focuses on developing intelligent chatbots to provide reliable, safe, and accessible healthcare information and support.

Fine-Tuning and Aligning LLMs for Trustworthiness and Safety in Healthcare

  • Teaser: Fine-tune and align LLMs using an expert-curated small HIV-FAQ dataset to ensure trustworthiness and safety in healthcare applications. Focus on enhancing the model's reliability and accuracy in providing health-related information.

  • Business Impact: Improve the reliability and trustworthiness of AI-driven healthcare applications, leading to safer and more effective patient interactions and support.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Healthcare, Trustworthiness

  • Subject Description: This thesis aims to fine-tune and align large language models (LLMs) using an expert-curated small HIV-FAQ dataset to ensure they provide trustworthy and safe information in healthcare applications. By focusing on enhancing the model's reliability and accuracy, the research seeks to improve the quality of AI-driven health-related interactions. This project is part of the international SNSF-funded HIVBOT Project, which investigates the use of intelligent chatbots as healthcare coaches. The goal is to develop AI models that can be trusted to deliver precise and safe health information, ultimately improving patient care and support through advanced AI technologies.

Training LLMs and Embedding Models on Low-Resource Languages for Healthcare

  • Teaser: Investigate low-resource languages like Pidgin English and train LLMs and embedding models on healthcare-related terminology. Enhance the accessibility and accuracy of healthcare information for speakers of underrepresented languages.

  • Business Impact: Improve the reach and effectiveness of healthcare information delivery to speakers of low-resource languages, promoting inclusivity and better health outcomes.

  • Research Fields/Areas: Artificial Intelligence, LLMs, Low-Resource Languages, Healthcare

  • Subject Description: This thesis investigates low-resource languages such as Pidgin English and focuses on training large language models (LLMs) and embedding models on healthcare-related terminology. The goal is to enhance the accessibility and accuracy of healthcare information for speakers of underrepresented languages. This research is part of the international SNSF-funded HIVBOT Project, which aims to develop intelligent chatbots as healthcare coaches. By addressing the challenges associated with low-resource languages, this project seeks to provide more inclusive and effective healthcare information, ultimately improving health outcomes and reducing disparities in healthcare access and quality.

Sustainability Over AI Hype in Generative versus Traditional Models

  • Teaser: Investigate health-related use cases where traditional AI approaches outperform generative AI in terms of environmental sustainability and economic efficiency, as discussed by Luccioni et al. (2024). Focus on identifying scenarios where conventional methods offer more sustainable and cost-effective solutions.

  • Business Impact: Promote more sustainable and economically efficient AI applications in healthcare, reducing environmental impact and operational costs.

  • Research Fields/Areas: Artificial Intelligence, Sustainability, Economic Efficiency, Healthcare

  • Subject Description: This thesis investigates health-related use cases where traditional AI approaches are more environmentally sustainable and economically efficient than generative AI, as discussed by Luccioni et al. (2024). The focus is on identifying specific scenarios where conventional AI methods provide more sustainable and cost-effective solutions. This research is part of the international SNSF-funded HIVBOT Project, which explores the development of intelligent chatbots as healthcare coaches. By highlighting the advantages of traditional AI in certain contexts, this project aims to guide the adoption of AI technologies that are both environmentally friendly and economically viable, thereby supporting the sustainable development of AI-driven healthcare applications.

Martin, A., Pande, C., Schwander, S., Ajuwon, A. J., & Pimmer, C. (2024). Domain-specific Embeddings for Question-Answering Systems: FAQs for Health Coaching. Proceedings of the AAAI Symposium Series, 3(1), Article 1.

Pande, C., Martin, A., & Pimmer, C. (2023). Towards Hybrid Dialog Management Strategies for a Health Coach Chatbot. In A. Martin, H.-G. Fill, A. Gerber, K. Hinkelmann, D. Lenat, R. Stolle, & F. van Harmelen (Eds.), Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI-MAKE 2023). CEUR-WS.org.

Martin, A. (2024). Ensuring Trustworthy Dialogue Systems. Zenodo.

Luccioni, A. S., Jernite, Y., & Strubell, E. (2024). Power Hungry Processing: Watts Driving the Cost of AI Deployment?

https://doi.org/10.1609/aaaiss.v3i1.31199
https://doi.org/10.5281/zenodo.10892026
https://doi.org/10.48550/arXiv.2406.20094
https://doi.org/10.1609/aaaiss.v3i1.31197
http://ceur-ws.org/Vol-3433
https://doi.org/10.5281/zenodo.11530754
https://doi.org/10.1145/3630106.3658542