{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2ca5bbbf-142b-48ed-89f2-18c63eac1eb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "_ = load_dotenv(find_dotenv())\n",
    "openai_api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42f0647a-3384-4a8d-bc5a-446b58a9ec80",
   "metadata": {},
   "source": [
    "## Basic App: Question & Answering from a Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "196e2414-39ff-4643-9c31-b130697e3074",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b191eb7d-a9df-47a6-95ac-fdec0b0b1969",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "567ecd8a-34b9-4489-9b6d-7b257919737a",
   "metadata": {},
   "source": [
    "**Load the text file**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e9801a4d-ecd7-460a-a4db-f0a1711163f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e01cbe49-38ce-40d2-8e01-01052483fbe5",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = TextLoader(\"data/be-good-and-how-not-to-die.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "afff80c1-1cd1-408e-ac28-2a8594af2de6",
   "metadata": {},
   "outputs": [],
   "source": [
    "document = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f6ff8b1-c63c-48b4-acaf-b959b98e770d",
   "metadata": {},
   "source": [
    "**The document is loaded as a Python list with metadata**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "684f2ccb-9ec4-4384-b875-3f89608292f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'list'>\n"
     ]
    }
   ],
   "source": [
    "print(type(document))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3f712992-beac-420c-b94e-1db729bd9508",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": [
    "print(len(document))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6cb6e6d1-765b-4762-8213-0c56e4829427",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'source': 'data/be-good-and-how-not-to-die.txt'}\n"
     ]
    }
   ],
   "source": [
    "print(document[0].metadata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "eef34dd2-aace-45c9-a87b-c3967370f44f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You have 1 document.\n"
     ]
    }
   ],
   "source": [
    "print(f\"You have {len(document)} document.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "dfb653b8-8035-40bf-b73c-542c90499486",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Your document has 27419 characters\n"
     ]
    }
   ],
   "source": [
    "print(f\"Your document has {len(document[0].page_content)} characters\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d8fbef1-4447-4ef5-ba21-9211b4486904",
   "metadata": {},
   "source": [
    "**Split the document in small chunks**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "18899367-8f8e-4d47-a61c-8041bafb4cd2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "65f31176-187b-4e0c-819a-dd35b7e61ee3",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=3000,\n",
    "    chunk_overlap=400\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0147e8c9-70fb-4990-8bf6-c77ea283200a",
   "metadata": {},
   "outputs": [],
   "source": [
    "document_chunks = text_splitter.split_documents(document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "86820a0a-e307-4bb9-9b69-24284d9f0754",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Now you have 12 chunks.\n"
     ]
    }
   ],
   "source": [
    "print(f\"Now you have {len(document_chunks)} chunks.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1756c471-5af1-4ba4-8f90-cfc70e897155",
   "metadata": {},
   "source": [
    "**Convert text chunks in numeric vectors (called \"embeddings\")**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2e40b2df-53ad-4984-8972-cf7258b762fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c1e7c4f6-2116-4ce1-8349-640fa6591970",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv020124/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain_community.embeddings.openai.OpenAIEmbeddings` was deprecated in langchain-community 0.1.0 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import OpenAIEmbeddings`.\n",
      "  warn_deprecated(\n"
     ]
    }
   ],
   "source": [
    "embeddings = OpenAIEmbeddings()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67881eb4-7418-4423-b4b9-b750e4a579cf",
   "metadata": {},
   "source": [
    "**Load the embeddings to a vector database**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "75acf1fc-25a4-437c-bd1b-7310a83bae9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.vectorstores import FAISS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "014dc49b-498a-49b0-883c-f323769320b3",
   "metadata": {},
   "source": [
    "*Careful: the next operation is expensive in OpenAI*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "8a2e4dbc-af28-497f-877a-933347c747ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "stored_embeddings = FAISS.from_documents(document_chunks, embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40ae930c-5d99-45c5-8597-52457d22b6d1",
   "metadata": {},
   "source": [
    "**Create a Retrieval Question & Answering Chain**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "e30338bd-58ae-467d-a01e-8bef3ae435b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import RetrievalQA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "d9bed1ac-d148-4fb1-9382-bc58db111d73",
   "metadata": {},
   "outputs": [],
   "source": [
    "QA_chain = RetrievalQA.from_chain_type(\n",
    "    llm=llm,\n",
    "    chain_type=\"stuff\",\n",
    "    retriever=stored_embeddings.as_retriever()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71a96c6d-a473-4ecc-885d-55f1618487d4",
   "metadata": {},
   "source": [
    "**Now we have a Question & Answering APP**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "c432c922-984b-49b9-b3c1-4a6c5d2ccd00",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"\"\"\n",
    "What is this article about? \n",
    "Describe it in less than 100 words.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "7d407274-69c0-4d7f-8def-5e2275027a41",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/juliocolomer/.pyenv/versions/3.11.4/envs/venv020124/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
      "  warn_deprecated(\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'\\nThis article is about the importance of perseverance in startups. The author shares their experiences and observations about the startup world and emphasizes the need to not give up, even when facing challenges and setbacks. They also discuss the role of fear and motivation in driving success. The main message is to keep pushing forward and not let obstacles stop you from achieving your goals.'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "QA_chain.run(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "87459203-207e-4180-af54-1a48457143b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "question2 = \"\"\"\n",
    "And how does it explain how to create somethin people want?\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "86947fdc-3d59-465c-b1a9-a061421bccb6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' Benevolence helps to create something people want by improving the morale of the founders, making others want to help the startup, and helping the founders be more decisive. When founders feel like they are helping others, they are more motivated to continue working even when faced with challenges. This can lead to a better product or service that meets the needs of users. Additionally, when others see that the startup is making a positive impact, they may be more inclined to support and contribute to its success. Finally, having a sense of benevolence can help founders make decisions that prioritize the needs and satisfaction of their users, leading to a product or service that people truly want.'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "QA_chain.run(question2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33528893-5c3f-4326-abd5-5a4b00a8ec22",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
