1. Introduction to NLP & Generative AI

In this lecture, we explore the fundamentals of Natural Language Processing (NLP) and Generative AI, specifically focusing on Large Language Models (LLMs). We will examine essential NLP tasks such as text classification, including intent recognition and sentiment analysis.

Introduction to Natural Language Processing (NLP)

Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to process and analyze human language. The goal of NLP is to bridge the gap between human communication and machine interpretation, allowing machines to perform tasks such as understanding, summarizing, and generating natural language. NLP powers a wide range of applications, including chatbots, virtual assistants like Siri and Google Assistant, search engines, and recommendation systems.

A key development in NLP has been the emergence of Large Language Models (LLMs), such as GPT (Generative Pre-trained Transformer), which significantly improve the ability to generate and understand text. LLMs are used in tasks like machine translation, text summarization, and question-answering systems by applying deep learning methods to vast datasets. These models are increasingly proficient at handling language complexity and ambiguity.

Key NLP Applications:

Text Classification: Categorizing text into predefined categories, such as sorting emails into “spam” and “non-spam” or identifying topics in news articles.
Sentiment Analysis: Detecting the emotional tone of a text, often used in social media monitoring to gauge public opinion about brands or events.
Machine Translation: Automatically converting text from one language to another, such as in DeepL or Google Translate, which has been significantly enhanced by deep learning techniques.
Named Entity Recognition (NER): Identifying and classifying key information such as names, dates, and locations within a text.
Speech Recognition: Converting spoken language into text, enabling applications like virtual assistants and transcription services.
Text Summarization: Automatically generating concise summaries of longer documents, a tool useful for business reports, legal documents, or academic research.

As NLP models evolve, they are becoming increasingly proficient at handling language complexity and ambiguity, including homonyms, polysemy, and contextual understanding. These advances are driven by the growing availability of large datasets and the implementation of powerful deep learning models, such as transformers, which have become the backbone of most state-of-the-art NLP systems.

Natural Language Processing (NLP) and Computational Linguistics (CL)

While Natural Language Processing (NLP) focuses on developing practical applications to handle human language, Computational Linguistics (CL) focuses on the scientific study of language using computational methods. CL contributes to linguistic research and the development of language models by analyzing patterns, constructing linguistic databases, and testing linguistic theories. CL provides a foundation for many NLP applications, combining elements of linguistics, computer science, and AI.

Natural Language Processing (NLP)

NLP is concerned with the interactions between computers and human language. Its objective is to enable computers to process or “understand” natural language in a way that is valuable and meaningful. This understanding is often used to perform tasks such as translating languages, answering questions, and summarizing information. NLP can also involve generating human-like text, as seen in applications like automated content creation and conversational AI.

A crucial distinction in NLP is between natural language understanding (NLU), which involves extracting meaning from text, and natural language generation (NLG), which focuses on creating coherent text. Both components are essential for tasks like machine translation and dialogue systems. For instance, NLU is needed to understand a user’s input, and NLG is required to generate an appropriate and context-aware response.

Computational Linguistics (CL)

Computational linguistics is the scientific study of language using computational techniques. It aims to develop models for processing natural language, which can contribute to both practical applications and theoretical research. Computational linguists work on analyzing language patterns, building models of language, and developing tools for studying and documenting languages.

Key areas of linguistic research in CL include:

Corpus Analysis: Searching large datasets of text (corpora) for linguistic patterns or examples.
Language Typology: Building structured databases to compare and contrast different languages.
Language Documentation: Creating tools to document endangered or lesser-known languages for preservation and study.
Ontology Development: Building frameworks for structuring linguistic data for interoperability and analysis.

CL underpins many practical NLP applications, including:

Speech Recognition: Systems like Siri and Google Assistant use CL techniques to transcribe spoken language into text.
Machine Translation: Services like DeepL or Google Translate rely on computational linguistics to model language structures for accurate translations.
Dialogue Systems: Virtual assistants and chatbots use CL models to maintain context-aware, meaningful conversations with users.

In specialized fields like medicine and healthcare, computational linguistics helps match patients to clinical trials, analyze health records, and assist in medical coding. In the legal field, CL is used for electronic discovery, where large volumes of legal documents are processed to find relevant information.

Subtasks in NLP and CL

NLP and CL systems typically break down tasks into smaller subtasks, which can be processed in a pipeline, where the output of one task becomes the input for the next. Examples of these subtasks include:

Part-of-speech tagging: Identifying whether a word is a noun, verb, adjective, etc.
Named entity recognition (NER): Detecting key entities such as people, locations, and organizations in text.
Coreference resolution: Determining which words refer to the same entity in a text.
Parsing: Analyzing the grammatical structure of a sentence, either through dependency parsing or constituency parsing.

These subtasks are evaluated through metrics such as precision and recall, which measure how accurately a model identifies relevant information.

Symbolic, Statistical, and Neural Approaches in NLP

In the field of NLP and CL, there are three primary approaches to solving language processing tasks: symbolic methods, statistical methods, and neural networks.

Symbolic Methods: These rely on hand-coded linguistic rules based on human expertise. These rules are applied to process and analyze language. Symbolic methods are known for their precision, as they are grounded in well-defined linguistic knowledge, such as grammar rules or dictionaries. However, they struggle with scalability, as manually coding rules for all language phenomena is time-consuming and often insufficient for handling rare cases or unpredictable language structures.
Statistical Methods: In contrast, statistical methods use machine learning to discover patterns in large datasets. Statistical approaches rely on training models to predict the most likely labels or structures for new data based on prior examples. They are more scalable and adaptable than symbolic methods, as they do not require manually coded rules for each possible scenario. However, statistical models may still struggle with rare language phenomena and precision in tasks requiring deep linguistic insight. Statistical approaches such as Hidden Markov Models (HMMs) or decision trees were widely adopted in the 1990s and significantly advanced NLP.
Neural Networks: Since 2015, neural networks have further transformed NLP, reducing the need for manual feature engineering. Neural networks, particularly deep learning models like transformers, leverage large datasets to automatically learn complex linguistic representations. Techniques like word embeddings (e.g., Word2Vec) allow models to capture the meanings of words based on context. Neural Machine Translation (NMT) and sequence-to-sequence models have rendered many intermediate tasks (like word alignment) obsolete, offering end-to-end solutions. These models are more robust to errors, unfamiliar inputs, and variations in language, significantly advancing text generation and understanding.

Many modern NLP systems combine symbolic, statistical, and neural approaches to optimize accuracy and scalability. For example, a system may use statistical methods for general language modeling, neural networks for handling variability and complexity, and symbolic methods for fine-tuning grammatical exceptions or specialized linguistic cases.

Text mining

Text mining, also known as text data mining or text analytics, refers to the process of extracting useful and high-quality information from text. The purpose of text mining is to derive meaningful patterns, trends, and insights from large amounts of unstructured textual data. Text mining is typically applied in areas such as business intelligence, scientific research, and social media analysis.

Key Text Mining Techniques:

Information Extraction: Automatically identifying key entities, relationships, and facts from text.
Text Categorization: Classifying documents into predefined categories based on their content.
Sentiment Analysis: Detecting the sentiment or emotional tone behind written content, often used to analyze customer reviews or social media posts.
Text Clustering: Grouping similar texts together based on their content.
Document Summarization: Condensing a large document into a shorter version that retains the key points.
Entity Relation Modeling: Understanding the relationships between key entities within a text.

Text mining plays a critical role in transforming unstructured data into structured formats that are easier to analyze and interpret. Techniques such as natural language processing (NLP), machine learning, and pattern recognition are commonly employed in text mining to automate the extraction and analysis process. With the explosion of digital content, text mining has become an essential tool for extracting valuable insights from vast text corpora.

Generative AI: Large Language Models (LLMs)

Generative AI refers to the technology designed to generate synthetic data that resembles the data it was trained on. This can include text, images, music, and even code. Generative AI learns patterns from large datasets, enabling it to create new, coherent outputs that mimic the structure, style, and context of the original data.

Foundation Models are a class of generative AI models that are designed to be adaptable across a wide range of applications. They are trained on vast amounts of data and can be fine-tuned for specific tasks such as translation, summarization, or question answering. These models serve as the building blocks for more specialized AI tasks and are commonly used in industry and research due to their broad applicability.

Large Language Models (LLMs) are a specific category of foundation models focused on text generation and understanding. Trained on extensive text datasets, LLMs are designed to complete text, understand context and instructions, and generate responses that are contextually appropriate. LLMs rely on transformer architectures, a breakthrough in natural language processing (NLP), to generate predictions and produce text based on the structures and patterns observed during training (Vaswani et al., 2023).

Beyond traditional LLMs that focus on text-based inputs and outputs, multimodal LLMs extend the capabilities of foundation models by incorporating multiple data types such as text, images, audio, and video. These models can process multimodal inputs and generate multimodal outputs, such as creating an image from a text description or generating captions for a video. The ability to handle diverse data types opens up new applications in areas such as content creation, medical imaging, and interactive AI systems (Beck et al., 2024).

LLM Evolution – Milestones & History

The development of Large Language Models (LLMs) has been marked by significant milestones in natural language processing (NLP), beginning with early models like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, and evolving into the sophisticated transformer architectures that dominate today. Each advancement has enhanced the ability of models to understand and generate human language, improving contextual comprehension, predictive accuracy, and processing efficiency for large-scale text data.

1. Fundamentals – Embeddings

Word embeddings have played a critical role in NLP evolution. In 2013, Google researchers, including Tomas Mikolov, introduced Word2vec, which mapped words into a continuous vector space, capturing semantic relationships by positioning similar words close to each other (Mikolov et al., 2013). This innovation significantly influenced LLM development by enabling models to grasp word meanings in context. In 2016, Facebook Research advanced this concept with fastText, which included subword information, making it more effective for rare and morphologically rich languages (Joulin et al., 2016).

2. Transformer – “Attention is All You Need”

The Transformer model, introduced by Vaswani et al. in the groundbreaking paper “Attention is All You Need” (Vaswani et al., 2023), marked a revolutionary step in NLP. By using an attention mechanism, it allowed models to focus on relevant parts of input text without relying on sequential data processing. This innovation made transformers more efficient and capable of handling larger datasets and complex patterns. Transformers have become the backbone of modern LLMs, enabling advancements in translation, summarization, and text generation.

2. Transformer – BERT & GPT-1

In 2018, the development of transformer models took a significant leap with the introduction of two major models: GPT-1 and BERT. These models were based on the transformer architecture but optimized for different tasks, making them foundational to modern NLP.

In 2018, two major models, GPT-1 and BERT, demonstrated the versatility of transformers for different NLP tasks. GPT-1 (Generative Pretrained Transformer), developed by OpenAI, used the decoder part of the transformer model for unidirectional text generation, focusing on generating coherent and context-based text. It was trained on large datasets of books and web texts.

BERT (Bidirectional Encoder Representations from Transformers), developed by Google, used the encoder part of the transformer architecture to understand context in both directions. BERT was optimized for language comprehension tasks such as question answering and text classification, making it a revolutionary tool for tasks requiring deep contextual understanding (Devlin et al., 2019).

3. Instruction & Alignment

InstructGPT improved LLM alignment with user instructions using Reinforcement Learning from Human Feedback (RLHF), allowing the model to better adhere to human preferences (Ouyang et al., 2022). GPT-3.5 expanded on this by enhancing response quality and instruction handling, which led to the development of ChatGPT. Released in 2022, ChatGPT brought these capabilities to a broader audience through an interactive chat interface, making sophisticated AI technology more accessible (Ouyang et al., 2022).

4. Multimodality

Multimodal LLMs extend traditional text models to process diverse data types like text, images, audio, and video, expanding their applications beyond text-based tasks (Li et al., 2024).

GPT-4 by OpenAI incorporated this multimodal functionality, allowing it to handle inputs and generate outputs across multiple formats, such as text and code. Gemini by Google DeepMind further advanced this by also generating images, positioning it as a versatile tool for content creation and multimedia applications. These models exemplify the growing trend towards integrating different data modalities in AI systems to create more comprehensive and flexible models.

Text Classification in NLP

Text classification is the task of organizing text into categories based on its content. It plays a critical role in enabling systems to automatically process, filter, and analyze large volumes of text data. This task is fundamental for applications like spam detection, customer feedback categorization, sentiment analysis, and topic labeling in news articles.

Text classification is typically categorized into three main types:

Binary classification: Where text is categorized into one of two classes, such as spam vs. non-spam emails.
Multi-class classification: In this case, text is classified into more than two categories (e.g., categorizing news articles into politics, sports, or entertainment).
Multi-label classification: This allows for assigning multiple labels to the same piece of text, such as tagging a customer service request with both “billing issue” and “technical issue.”

As the amount of digital text data grows rapidly, the importance of text classification also rises, helping automate processes that would otherwise be manual and time-consuming. Businesses increasingly use text classification to sort customer service tickets, automate content moderation, and classify documents efficiently.

Intent Classification

Intent classification is a process in NLP that focuses on identifying the underlying intention behind a user's input. It plays a critical role in conversational AI, especially in applications like chatbots, virtual assistants, and automated customer service systems. Intent classification allows machines to understand what users want and respond accordingly, enhancing the effectiveness and accuracy of human-computer interactions.

How Intent Classification Works:

Intent classification involves mapping user input (e.g., a sentence or query) to a specific intent category. The goal is to recognize what the user is trying to achieve—whether they are asking for information, making a request, or seeking assistance. Intent classification models are trained on labeled datasets, where various user queries are tagged with specific intents. The machine learning model then learns patterns from these datasets to predict intents for new, unseen queries.

For instance, a user might say, “I want to book a flight.” The system recognizes the intent as flight booking and responds by asking for flight details. Similarly, if the user says, “Where is my package?”, the intent is package tracking, prompting the system to retrieve tracking information.

Applications of Intent Classification:

Customer Service: In automated customer service systems, intent classification helps chatbots and virtual assistants categorize queries like “I want a refund” or “How do I reset my password?” and respond appropriately. By identifying intents such as refund request or password reset, the AI system can either provide the necessary information or escalate the request to a human representative.
Virtual Assistants: Intent classification enables virtual assistants like Siri, Alexa, and Google Assistant to understand commands like “Set a reminder for 3 PM” or “Play music.” Once the intent is recognized (e.g., setting reminders or playing music), the assistant executes the corresponding action.
E-commerce: In e-commerce platforms, intent classification helps in identifying user intents related to purchases, such as “Add to cart” or “Track my order.” This enhances the user experience by streamlining interactions and reducing the steps required to complete a transaction.
Search Engines: Some search engines use intent classification to interpret the user's intent behind a query and provide more relevant search results. For example, if the intent behind “best laptops 2024” is to gather information about product reviews, the search engine can prioritize articles and guides.

Challenges in Intent Classification:

Ambiguity: A user query may be vague or contain multiple intents, such as “Can I cancel my order and get a refund?” Here, the system must recognize both the cancel order and refund intents.
Short Queries: User inputs are often short and lack context, making it difficult to classify intents accurately. For instance, the query “cancel” could refer to different tasks, depending on the context.

Multi-intent queries have gained significant interest in research, as they require the system to recognize multiple overlapping intents in a single sentence. This type of classification is crucial for delivering better user experiences in complex conversations, especially in domains such as customer service and virtual assistants.

Intent classification is a foundational technology in conversational AI, allowing for smoother and more effective interactions. Modern systems, such as those using GPT-based models, are continually evolving to better understand user intent, even in ambiguous or multi-intent scenarios. With these advancements, conversational agents are becoming increasingly adept at handling complex, human-like dialogues.

Sentiment Analysis

Sentiment analysis, also known as opinion mining, is the process of identifying and categorizing opinions expressed in a piece of text, especially determining whether the sentiment is positive, negative, or neutral. It utilizes natural language processing (NLP), text analysis, and computational linguistics to extract subjective information from data.

How Sentiment Analysis Works:

Sentiment analysis involves breaking down text to understand the emotional tone behind the words. The goal is to gauge the attitude, opinions, or emotions of the speaker or writer. Sentiment analysis can operate at various levels:

Document-level sentiment analysis: Classifies the overall sentiment of a full document or piece of text as positive, negative, or neutral.
Sentence-level sentiment analysis: Breaks down each sentence within a document to determine its sentiment.
Aspect-based sentiment analysis: Focuses on particular aspects or features of an entity (e.g., evaluating a product’s camera, battery life, etc.) and determines the sentiment associated with each aspect.

Types of Sentiment Analysis:

Polarity Classification: The most basic task, where the sentiment is classified as positive, negative, or neutral.
Emotion Detection: Beyond simple polarity, sentiment analysis can identify specific emotions such as happiness, anger, sadness, or frustration.
Subjectivity/Objectivity Identification: This distinguishes between subjective sentences (those expressing opinions) and objective ones (factual information).

Applications in Business:

Brand Monitoring: Businesses use sentiment analysis to monitor online reviews, social media posts, and customer feedback to understand how their brand or products are perceived. For instance, analyzing Twitter data to see if users' opinions about a product are generally favorable or not.
Customer Feedback: Companies can analyze customer reviews and feedback from surveys or forums to improve services, products, and overall customer satisfaction.
Reputation Management: Monitoring public sentiment about a company helps in identifying potential PR issues early and addressing them proactively.

Challenges:

Sarcasm and Irony: Sentiment analysis systems may struggle to detect sarcasm or irony, where the literal meaning of a sentence does not reflect the true sentiment.
Negation Handling: Phrases like “not good” can invert the meaning of an otherwise positive sentiment, making detection challenging.
Cultural Nuances: Sentiment can vary based on cultural and linguistic contexts, making it difficult to apply a one-size-fits-all model.

Sentiment analysis has grown in importance with the rise of social media and online platforms, where public opinion can be critical for business success. Automated tools for sentiment analysis help companies stay informed about customer perspectives and adapt to market needs efficiently.

Information Extraction

Information extraction (IE) refers to the process of automatically extracting structured, actionable information from unstructured text. It encompasses various techniques like Named Entity Recognition (NER) and Topic Modeling, each aimed at transforming vast amounts of unstructured text data into structured formats for easier analysis and utilization. Information extraction is crucial for processing large text corpora in various industries such as journalism, healthcare, business intelligence, and legal research. By identifying relevant data points, IE allows systems to automate tasks like document classification, customer feedback analysis, and search engine optimization.

Named Entity Extraction (NER)

One of the most widely used sub-tasks within IE is Named Entity Recognition (NER). NER focuses on identifying and categorizing key entities within a text, such as names of people, organizations, locations, dates, and other relevant items.

NER plays a pivotal role in bridging the gap between unstructured text and structured data, allowing machines to sift through large volumes of information and extract specific data points. These entities often contain crucial information, which can be further used for data analytics, search engines, or document indexing.

How NER Works:

Tokenization: The first step in Named Entity Recognition is breaking down the text into individual tokens, which could be words, phrases, or sentences. For example, the sentence “In 1898, Marie Curie discovered the chemical element radium in Paris.” would be split into the following tokens: “In”, “1898”, “,”, “Marie”, “Curie”, “discovered”, “the”, “chemical”, “element”, “radium”, “in”, “Paris” and “.”
Part of Speech (POS) Tagging: After tokenization, the system identifies the part of speech for each token, such as nouns, verbs, or adjectives. For example:
- “In” (Adposition)
- “1898” (Numeral)
- “,” (Punctuation)
- “Marie” (Proper Noun)
- “Curie” (Proper Noun)
- “discovered” (Verb)
- “the” (Determiner)
- “chemical” (Noun)
- “element” (Noun)
- “radium” (Noun)
- “in” (Adposition)
- “Paris” (Proper Noun)
- “.” (Punctuation)
Chunking: In this step, the system groups sequences of tokens into meaningful chunks, such as noun phrases or verb phrases. In this case, “Marie Curie” and “the chemical element radium” would be grouped as a noun phrases.
Named Entity Classification: The chunks are then classified into predefined entity types such as Person, Organization, Location, or Date. For instance, in the sentence:
- “1898” is classified as a Date.
- “Marie Curie” is classified as a Person.
- “Paris” is classified as a Location.
Entity Extraction: Finally, the system assigns the classified entities to corresponding variables or slots. For instance:
- Date = “1898”
- Person = “Marie Curie”
- Location = “Paris”

At the end of these steps, NER extracts structured data from unstructured text, making it easier for downstream systems to use this information in tasks like information retrieval, question-answering systems, or knowledge databases.

Applications of NER:

News Aggregation: NER helps to categorize news articles by identifying the primary entities involved in the stories, making it easier for readers to find articles related to specific people, companies, or locations.
Social Media Monitoring: NER is often used to track entities like brand names or public figures in social media posts to understand public opinion and sentiment. It helps companies monitor discussions about their products and identify trends.
Healthcare and Biomedical Fields: In medical text mining, NER extracts crucial information such as drug names, diseases, and patient information from clinical notes or research papers.
Legal Document Processing: In legal sectors, NER can be applied to scan documents for specific legal entities like case names, dates, or organizations, which streamlines legal research and case analysis.
Business Intelligence: NER is applied in analyzing news articles or financial reports to extract key entities like company names, products, and industry sectors, providing actionable insights for decision-making.

Challenges in NER:

Ambiguity: Words like “Apple” can refer to either the company or the fruit depending on the context, making it a challenge for systems to correctly classify entities.
Nested Entities: Sometimes entities are nested within other entities, making it difficult for a system to identify and classify them accurately.
Domain-Specific Variations: General NER models may struggle in specialized domains like medicine or law, where domain-specific terminology must be recognized.

NER is integral to transforming text into actionable data and is widely adopted across industries from journalism to customer service and cybersecurity. As NER technologies continue to evolve, their role in facilitating automated data processing will only expand.

Topic Modeling

Topic modeling is another crucial technique in Information Extraction that identifies hidden thematic structures within large text corpora. It is an unsupervised machine learning method used to discover the underlying topics present in a collection of documents. Unlike text classification, which assigns predefined labels to documents, topic modeling reveals the natural structure of the text by grouping related words into distinct topics. This method is essential for organizing, summarizing, and understanding large datasets.

Applications of Topic Modeling:

Document Clustering: Topic modeling helps organize unstructured documents into clusters based on the underlying themes. For instance, news articles can be grouped into categories such as politics, sports, or entertainment.
Customer Feedback Analysis: Topic modeling is widely used by companies to analyze customer reviews and feedback. By identifying recurring topics, businesses can quickly understand common customer concerns, areas of satisfaction, or areas that need improvement without manually reading through all reviews.
Social Media Analysis: Topic modeling is applied to analyze large-scale social media data to identify trending topics, public concerns, or ongoing discussions about specific events or products. This helps businesses stay updated on customer sentiment and market trends.
Recommendation Systems: Topic modeling improves recommendation systems by analyzing the underlying themes of documents or articles. For instance, in content platforms, it helps identify user preferences and deliver personalized recommendations based on thematic interests.
Research and Academia: Topic modeling assists researchers by summarizing and categorizing large bodies of literature. This allows academics to quickly locate papers relevant to their research areas and identify emerging trends in their fields.

Topic modeling is an essential tool for businesses and researchers alike, as it provides insights into hidden patterns and thematic structures in text datasets. It can be used with Named Entity Recognition (NER) to further enhance the extraction of valuable information from unstructured data. For example, NER can identify key entities within documents, while Topic Modeling uncovers the broader themes, offering a comprehensive analysis of the content.

Prompt Engineering in Large Language Models (LLMs)

Prompt engineering refers to the process of designing and refining prompts that guide large language models (LLMs) like GPT-3, GPT-4, and other advanced AI systems to generate desired responses. The quality of the response is heavily influenced by how the input query (the prompt) is structured. Here, we explore several aspects of prompt engineering, from standard procedures to advanced techniques like zero-shot and few-shot prompting.

Standard Procedure in LLM Prompting

The typical procedure of interacting with an LLM follows a simple but effective workflow:

Question: The user poses a question or query (for example, in ChatGPT).
Prompt Transformation: The LLM converts this user query into a prompt by combining it with system instructions that guide the model's behavior. The prompt serves as a refined input for the model.
Response Generation: The LLM processes the prompt through its inference process, where it predicts and generates the most relevant responses based on its training data.
Response: The final answer is generated and sent back to the user.

This approach is fundamental to conversational systems like ChatGPT, where users input questions and receive human-like responses based on the LLM's understanding of the query.

Token, Token, Token…

Tokens are the building blocks of language models. A token can represent a word, part of a word, or even a character, depending on the context. In models like GPT, all inputs are broken down into tokens, and the model processes these tokens to predict the next word. Every word or phrase you input into an LLM is converted into multiple tokens, depending on the complexity and language of the query.

LLMs Are Excellent at Predicting the Next Words (Tokens) 🦜

As depicted in the image, LLMs are designed to predict the next word (token) based on the previous context. For example, when provided with the prompt “Pineapple on a pizza is a”, the model evaluates probabilities for several possible next words, such as “crime,” “controversy,” “treat,” or “recipe.”

LLMs generate responses by calculating the probability distribution of potential next tokens and selecting the most probable one, sometimes adding randomization through a technique known as temperature adjustment.
With higher temperatures, the model generates more creative or diverse responses, while lower temperatures make the output more deterministic (as shown in the image).

Zero- and Few-Shot Prompting

In zero-shot prompting, the model is given no examples before being asked to generate a response. The prompt provides the model with all the information it needs, and the model is expected to produce relevant outputs based on its pre-trained knowledge. This approach is useful when there are no explicit examples available or when asking the model to perform new, unforeseen tasks.

In contrast to zero-shot prompting, few-shot prompting involves providing the model with a few input-output examples to guide its responses. This method is especially helpful for improving the model’s behavior and output quality.

There are more sophisticated prompting techniques available, which will be investigated in a later lecture.

Tooling

The ecosystem around LLMs includes several tools and frameworks for both developers and non-developers:

Pro-Code Agent Frameworks: Examples include LangChain, haystack, and LlamaIndex, which offer advanced features for building sophisticated LLM-driven applications.
No-Code Agent Frameworks: Tools like LangFlow and Flowise make it easy for users to build LLM applications without needing programming skills.
LLM Playgrounds: Platforms like OpenAI, Cohere, ChatGPT, and Perplexity allow users to experiment with LLMs in user-friendly environments.
Dialogue Frameworks: Tools like Voiceflow and botpress help in building conversational AI applications, while Gradio and Streamlit provide interfaces for integrating LLMs into web applications.

References

Beck, M., Pöppel, K., Spanring, M., Auer, A., Prudnikova, O., Kopp, M., Klambauer, G., Brandstetter, J., & Hochreiter, S. (2024). xLSTM: Extended Long Short-Term Memory. arXiv:2405.04517. https://arxiv.org/abs/2405.04517
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. https://arxiv.org/abs/1301.3781
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention is all you need. arXiv:1706.03762. https://arxiv.org/abs/1706.03762
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of Tricks for Efficient Text Classification. arXiv:1607.01759. https://arxiv.org/abs/1607.01759
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155. https://doi.org/10.48550/arXiv.2203.02155

PreviousNLP & GenAI Next2. Specialised models vs. LLMs for NLP tasks

Last updated 9 months ago