my-project-public

repository

loading code, commits, and activity

repositories

loading repo index

#1	---
#2	title: 🤖 Large language models (LLMs)
#3	---
#4
#5	## Overview
#6
#7	Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface.
#8
#9	<CardGroup cols={4}>
#10	<Card title="OpenAI" href="#openai"></Card>
#11	<Card title="Google AI" href="#google-ai"></Card>
#12	<Card title="Azure OpenAI" href="#azure-openai"></Card>
#13	<Card title="Anthropic" href="#anthropic"></Card>
#14	<Card title="Cohere" href="#cohere"></Card>
#15	<Card title="Together" href="#together"></Card>
#16	<Card title="Ollama" href="#ollama"></Card>
#17	<Card title="vLLM" href="#vllm"></Card>
#18	<Card title="Clarifai" href="#clarifai"></Card>
#19	<Card title="GPT4All" href="#gpt4all"></Card>
#20	<Card title="JinaChat" href="#jinachat"></Card>
#21	<Card title="Hugging Face" href="#hugging-face"></Card>
#22	<Card title="Llama2" href="#llama2"></Card>
#23	<Card title="Vertex AI" href="#vertex-ai"></Card>
#24	<Card title="Mistral AI" href="#mistral-ai"></Card>
#25	<Card title="AWS Bedrock" href="#aws-bedrock"></Card>
#26	<Card title="Groq" href="#groq"></Card>
#27	<Card title="NVIDIA AI" href="#nvidia-ai"></Card>
#28	</CardGroup>
#29
#30	## OpenAI
#31
#32	To use OpenAI LLM models, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
#33
#34	Once you have obtained the key, you can use it like this:
#35
#36	```python
#37	import os
#38	from embedchain import App
#39
#40	os.environ['OPENAI_API_KEY'] = 'xxx'
#41
#42	app = App()
#43	app.add("https://en.wikipedia.org/wiki/OpenAI")
#44	app.query("What is OpenAI?")
#45	```
#46
#47	If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file.
#48
#49	<CodeGroup>
#50
#51	```python main.py
#52	import os
#53	from embedchain import App
#54
#55	os.environ['OPENAI_API_KEY'] = 'xxx'
#56
#57	# load llm configuration from config.yaml file
#58	app = App.from_config(config_path="config.yaml")
#59	```
#60
#61	```yaml config.yaml
#62	llm:
#63	provider: openai
#64	config:
#65	model: 'gpt-4o-mini'
#66	temperature: 0.5
#67	max_tokens: 1000
#68	top_p: 1
#69	stream: false
#70	```
#71	</CodeGroup>
#72
#73	### Function Calling
#74	Embedchain supports OpenAI [Function calling](https://platform.openai.com/docs/guides/function-calling) with a single function. It accepts inputs in accordance with the [Langchain interface](https://python.langchain.com/docs/modules/model_io/chat/function_calling#legacy-args-functions-and-function_call).
#75
#76	<Accordion title="Pydantic Model">
#77	```python
#78	from pydantic import BaseModel
#79
#80	class multiply(BaseModel):
#81	"""Multiply two integers together."""
#82
#83	a: int = Field(..., description="First integer")
#84	b: int = Field(..., description="Second integer")
#85	```
#86	</Accordion>
#87
#88	<Accordion title="Python function">
#89	```python
#90	def multiply(a: int, b: int) -> int:
#91	"""Multiply two integers together.
#92
#93	Args:
#94	a: First integer
#95	b: Second integer
#96	"""
#97	return a * b
#98	```
#99	</Accordion>
#100	<Accordion title="OpenAI tool dictionary">
#101	```python
#102	multiply = {
#103	"type": "function",
#104	"function": {
#105	"name": "multiply",
#106	"description": "Multiply two integers together.",
#107	"parameters": {
#108	"type": "object",
#109	"properties": {
#110	"a": {
#111	"description": "First integer",
#112	"type": "integer"
#113	},
#114	"b": {
#115	"description": "Second integer",
#116	"type": "integer"
#117	}
#118	},
#119	"required": [
#120	"a",
#121	"b"
#122	]
#123	}
#124	}
#125	}
#126	```
#127	</Accordion>
#128
#129	With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function.
#130
#131	```python
#132	import os
#133	from embedchain import App
#134	from embedchain.llm.openai import OpenAILlm
#135
#136	os.environ["OPENAI_API_KEY"] = "sk-xxx"
#137
#138	llm = OpenAILlm(tools=multiply)
#139	app = App(llm=llm)
#140
#141	result = app.query("What is the result of 125 multiplied by fifteen?")
#142	```
#143
#144	## Google AI
#145
#146	To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)
#147
#148	<CodeGroup>
#149	```python main.py
#150	import os
#151	from embedchain import App
#152
#153	os.environ["GOOGLE_API_KEY"] = "xxx"
#154
#155	app = App.from_config(config_path="config.yaml")
#156
#157	app.add("https://www.forbes.com/profile/elon-musk")
#158
#159	response = app.query("What is the net worth of Elon Musk?")
#160	if app.llm.config.stream: # if stream is enabled, response is a generator
#161	for chunk in response:
#162	print(chunk)
#163	else:
#164	print(response)
#165	```
#166
#167	```yaml config.yaml
#168	llm:
#169	provider: google
#170	config:
#171	model: gemini-pro
#172	max_tokens: 1000
#173	temperature: 0.5
#174	top_p: 1
#175	stream: false
#176
#177	embedder:
#178	provider: google
#179	config:
#180	model: 'models/embedding-001'
#181	task_type: "retrieval_document"
#182	title: "Embeddings for Embedchain"
#183	```
#184	</CodeGroup>
#185
#186	## Azure OpenAI
#187
#188	To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below:
#189
#190	<CodeGroup>
#191
#192	```python main.py
#193	import os
#194	from embedchain import App
#195
#196	os.environ["OPENAI_API_TYPE"] = "azure"
#197	os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
#198	os.environ["AZURE_OPENAI_KEY"] = "xxx"
#199	os.environ["OPENAI_API_VERSION"] = "xxx"
#200
#201	app = App.from_config(config_path="config.yaml")
#202	```
#203
#204	```yaml config.yaml
#205	llm:
#206	provider: azure_openai
#207	config:
#208	model: gpt-4o-mini
#209	deployment_name: your_llm_deployment_name
#210	temperature: 0.5
#211	max_tokens: 1000
#212	top_p: 1
#213	stream: false
#214
#215	embedder:
#216	provider: azure_openai
#217	config:
#218	model: text-embedding-ada-002
#219	deployment_name: you_embedding_model_deployment_name
#220	```
#221	</CodeGroup>
#222
#223	You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
#224
#225	## Anthropic
#226
#227	To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on their [Account Settings Page](https://console.anthropic.com/account/keys).
#228
#229	<CodeGroup>
#230
#231	```python main.py
#232	import os
#233	from embedchain import App
#234
#235	os.environ["ANTHROPIC_API_KEY"] = "xxx"
#236
#237	# load llm configuration from config.yaml file
#238	app = App.from_config(config_path="config.yaml")
#239	```
#240
#241	```yaml config.yaml
#242	llm:
#243	provider: anthropic
#244	config:
#245	model: 'claude-instant-1'
#246	temperature: 0.5
#247	max_tokens: 1000
#248	top_p: 1
#249	stream: false
#250	```
#251
#252	</CodeGroup>
#253
#254	## Cohere
#255
#256	Install related dependencies using the following command:
#257
#258	```bash
#259	pip install --upgrade 'embedchain[cohere]'
#260	```
#261
#262	Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys).
#263
#264	Once you have the API key, you are all set to use it with Embedchain.
#265
#266	<CodeGroup>
#267
#268	```python main.py
#269	import os
#270	from embedchain import App
#271
#272	os.environ["COHERE_API_KEY"] = "xxx"
#273
#274	# load llm configuration from config.yaml file
#275	app = App.from_config(config_path="config.yaml")
#276	```
#277
#278	```yaml config.yaml
#279	llm:
#280	provider: cohere
#281	config:
#282	model: large
#283	temperature: 0.5
#284	max_tokens: 1000
#285	top_p: 1
#286	```
#287
#288	</CodeGroup>
#289
#290	## Together
#291
#292	Install related dependencies using the following command:
#293
#294	```bash
#295	pip install --upgrade 'embedchain[together]'
#296	```
#297
#298	Set the `TOGETHER_API_KEY` as environment variable which you can find on their [Account settings page](https://api.together.xyz/settings/api-keys).
#299
#300	Once you have the API key, you are all set to use it with Embedchain.
#301
#302	<CodeGroup>
#303
#304	```python main.py
#305	import os
#306	from embedchain import App
#307
#308	os.environ["TOGETHER_API_KEY"] = "xxx"
#309
#310	# load llm configuration from config.yaml file
#311	app = App.from_config(config_path="config.yaml")
#312	```
#313
#314	```yaml config.yaml
#315	llm:
#316	provider: together
#317	config:
#318	model: togethercomputer/RedPajama-INCITE-7B-Base
#319	temperature: 0.5
#320	max_tokens: 1000
#321	top_p: 1
#322	```
#323
#324	</CodeGroup>
#325
#326	## Ollama
#327
#328	Setup Ollama using https://github.com/jmorganca/ollama
#329
#330	<CodeGroup>
#331
#332	```python main.py
#333	import os
#334	os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434"
#335	from embedchain import App
#336
#337	# load llm configuration from config.yaml file
#338	app = App.from_config(config_path="config.yaml")
#339	```
#340
#341	```yaml config.yaml
#342	llm:
#343	provider: ollama
#344	config:
#345	model: 'llama2'
#346	temperature: 0.5
#347	top_p: 1
#348	stream: true
#349	base_url: 'http://localhost:11434'
#350	embedder:
#351	provider: ollama
#352	config:
#353	model: znbang/bge:small-en-v1.5-q8_0
#354	base_url: http://localhost:11434
#355
#356	```
#357
#358	</CodeGroup>
#359
#360
#361	## vLLM
#362
#363	Setup vLLM by following instructions given in [their docs](https://docs.vllm.ai/en/latest/getting_started/installation.html).
#364
#365	<CodeGroup>
#366
#367	```python main.py
#368	import os
#369	from embedchain import App
#370
#371	# load llm configuration from config.yaml file
#372	app = App.from_config(config_path="config.yaml")
#373	```
#374
#375	```yaml config.yaml
#376	llm:
#377	provider: vllm
#378	config:
#379	model: 'meta-llama/Llama-2-70b-hf'
#380	temperature: 0.5
#381	top_p: 1
#382	top_k: 10
#383	stream: true
#384	trust_remote_code: true
#385	```
#386
#387	</CodeGroup>
#388
#389	## Clarifai
#390
#391	Install related dependencies using the following command:
#392
#393	```bash
#394	pip install --upgrade 'embedchain[clarifai]'
#395	```
#396
#397	set the `CLARIFAI_PAT` as environment variable which you can find in the [security page](https://clarifai.com/settings/security). Optionally you can also pass the PAT key as parameters in LLM/Embedder class.
#398
#399	Now you are all set with exploring Embedchain.
#400
#401	<CodeGroup>
#402
#403	```python main.py
#404	import os
#405	from embedchain import App
#406
#407	os.environ["CLARIFAI_PAT"] = "XXX"
#408
#409	# load llm configuration from config.yaml file
#410	app = App.from_config(config_path="config.yaml")
#411
#412	#Now let's add some data.
#413	app.add("https://www.forbes.com/profile/elon-musk")
#414
#415	#Query the app
#416	response = app.query("what college degrees does elon musk have?")
#417	```
#418	Head to [Clarifai Platform](https://clarifai.com/explore/models?page=1&perPage=24&filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D) to browse various State-of-the-Art LLM models for your use case.
#419	For passing model inference parameters use `model_kwargs` argument in the config file. Also you can use `api_key` argument to pass `CLARIFAI_PAT` in the config.
#420
#421	```yaml config.yaml
#422	llm:
#423	provider: clarifai
#424	config:
#425	model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct"
#426	model_kwargs:
#427	temperature: 0.5
#428	max_tokens: 1000
#429	embedder:
#430	provider: clarifai
#431	config:
#432	model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15"
#433	```
#434	</CodeGroup>
#435
#436
#437	## GPT4ALL
#438
#439	Install related dependencies using the following command:
#440
#441	```bash
#442	pip install --upgrade 'embedchain[opensource]'
#443	```
#444
#445	GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:
#446
#447	<CodeGroup>
#448
#449	```python main.py
#450	from embedchain import App
#451
#452	# load llm configuration from config.yaml file
#453	app = App.from_config(config_path="config.yaml")
#454	```
#455
#456	```yaml config.yaml
#457	llm:
#458	provider: gpt4all
#459	config:
#460	model: 'orca-mini-3b-gguf2-q4_0.gguf'
#461	temperature: 0.5
#462	max_tokens: 1000
#463	top_p: 1
#464	stream: false
#465
#466	embedder:
#467	provider: gpt4all
#468	```
#469	</CodeGroup>
#470
#471
#472	## JinaChat
#473
#474	First, set `JINACHAT_API_KEY` in environment variable which you can obtain from [their platform](https://chat.jina.ai/api).
#475
#476	Once you have the key, load the app using the config yaml file:
#477
#478	<CodeGroup>
#479
#480	```python main.py
#481	import os
#482	from embedchain import App
#483
#484	os.environ["JINACHAT_API_KEY"] = "xxx"
#485	# load llm configuration from config.yaml file
#486	app = App.from_config(config_path="config.yaml")
#487	```
#488
#489	```yaml config.yaml
#490	llm:
#491	provider: jina
#492	config:
#493	temperature: 0.5
#494	max_tokens: 1000
#495	top_p: 1
#496	stream: false
#497	```
#498	</CodeGroup>
#499
#500
#501	## Hugging Face
#502
#503
#504	Install related dependencies using the following command:
#505
#506	```bash
#507	pip install --upgrade 'embedchain[huggingface-hub]'
#508	```
#509
#510	First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens).
#511
#512	You can load the LLMs from Hugging Face using three ways:
#513
#514	- [Hugging Face Hub](#hugging-face-hub)
#515	- [Hugging Face Local Pipelines](#hugging-face-local-pipelines)
#516	- [Hugging Face Inference Endpoint](#hugging-face-inference-endpoint)
#517
#518	### Hugging Face Hub
#519
#520	To load the model from Hugging Face Hub, use the following code:
#521
#522	<CodeGroup>
#523
#524	```python main.py
#525	import os
#526	from embedchain import App
#527
#528	os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
#529
#530	config = {
#531	"app": {"config": {"id": "my-app"}},
#532	"llm": {
#533	"provider": "huggingface",
#534	"config": {
#535	"model": "bigscience/bloom-1b7",
#536	"top_p": 0.5,
#537	"max_length": 200,
#538	"temperature": 0.1,
#539	},
#540	},
#541	}
#542
#543	app = App.from_config(config=config)
#544	```
#545	</CodeGroup>
#546
#547	### Hugging Face Local Pipelines
#548
#549	If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below:
#550
#551	<CodeGroup>
#552	```python main.py
#553	from embedchain import App
#554
#555	config = {
#556	"app": {"config": {"id": "my-app"}},
#557	"llm": {
#558	"provider": "huggingface",
#559	"config": {
#560	"model": "Trendyol/Trendyol-LLM-7b-chat-v0.1",
#561	"local": True, # Necessary if you want to run model locally
#562	"top_p": 0.5,
#563	"max_tokens": 1000,
#564	"temperature": 0.1,
#565	},
#566	}
#567	}
#568	app = App.from_config(config=config)
#569	```
#570	</CodeGroup>
#571
#572	### Hugging Face Inference Endpoint
#573
#574	You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.
#575
#576	Then, load the app using the config yaml file:
#577
#578	<CodeGroup>
#579
#580	```python main.py
#581	from embedchain import App
#582
#583	config = {
#584	"app": {"config": {"id": "my-app"}},
#585	"llm": {
#586	"provider": "huggingface",
#587	"config": {
#588	"endpoint": "https://api-inference.huggingface.co/models/gpt2",
#589	"model_params": {"temprature": 0.1, "max_new_tokens": 100}
#590	},
#591	},
#592	}
#593	app = App.from_config(config=config)
#594
#595	```
#596	</CodeGroup>
#597
#598	Currently only supports `text-generation` and `text2text-generation` for now [[ref](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html?highlight=huggingfaceendpoint#)].
#599
#600	See langchain's [hugging face endpoint](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfaceendpoint) for more information.
#601
#602	## Llama2
#603
#604	Llama2 is integrated through [Replicate](https://replicate.com/). Set `REPLICATE_API_TOKEN` in environment variable which you can obtain from [their platform](https://replicate.com/account/api-tokens).
#605
#606	Once you have the token, load the app using the config yaml file:
#607
#608	<CodeGroup>
#609
#610	```python main.py
#611	import os
#612	from embedchain import App
#613
#614	os.environ["REPLICATE_API_TOKEN"] = "xxx"
#615
#616	# load llm configuration from config.yaml file
#617	app = App.from_config(config_path="config.yaml")
#618	```
#619
#620	```yaml config.yaml
#621	llm:
#622	provider: llama2
#623	config:
#624	model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
#625	temperature: 0.5
#626	max_tokens: 1000
#627	top_p: 0.5
#628	stream: false
#629	```
#630	</CodeGroup>
#631
#632	## Vertex AI
#633
#634	Setup Google Cloud Platform application credentials by following the instruction on [GCP](https://cloud.google.com/docs/authentication/external/set-up-adc). Once setup is done, use the following code to create an app using VertexAI as provider:
#635
#636	<CodeGroup>
#637
#638	```python main.py
#639	from embedchain import App
#640
#641	# load llm configuration from config.yaml file
#642	app = App.from_config(config_path="config.yaml")
#643	```
#644
#645	```yaml config.yaml
#646	llm:
#647	provider: vertexai
#648	config:
#649	model: 'chat-bison'
#650	temperature: 0.5
#651	top_p: 0.5
#652	```
#653	</CodeGroup>
#654
#655
#656	## Mistral AI
#657
#658	Obtain the Mistral AI api key from their [console](https://console.mistral.ai/).
#659
#660	<CodeGroup>
#661
#662	```python main.py
#663	os.environ["MISTRAL_API_KEY"] = "xxx"
#664
#665	app = App.from_config(config_path="config.yaml")
#666
#667	app.add("https://www.forbes.com/profile/elon-musk")
#668
#669	response = app.query("what is the net worth of Elon Musk?")
#670	# As of January 16, 2024, Elon Musk's net worth is $225.4 billion.
#671
#672	response = app.chat("which companies does elon own?")
#673	# Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X.
#674
#675	response = app.chat("what question did I ask you already?")
#676	# You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X.
#677	```
#678
#679	```yaml config.yaml
#680	llm:
#681	provider: mistralai
#682	config:
#683	model: mistral-tiny
#684	temperature: 0.5
#685	max_tokens: 1000
#686	top_p: 1
#687	embedder:
#688	provider: mistralai
#689	config:
#690	model: mistral-embed
#691	```
#692	</CodeGroup>
#693
#694
#695	## AWS Bedrock
#696
#697	### Setup
#698	- Before using the AWS Bedrock LLM, make sure you have the appropriate model access from [Bedrock Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess).
#699	- You will also need to authenticate the `boto3` client by using a method in the [AWS documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials)
#700	- You can optionally export an `AWS_REGION`
#701
#702
#703	### Usage
#704
#705	<CodeGroup>
#706
#707	```python main.py
#708	import os
#709	from embedchain import App
#710
#711	os.environ["AWS_REGION"] = "us-west-2"
#712
#713	app = App.from_config(config_path="config.yaml")
#714	```
#715
#716	```yaml config.yaml
#717	llm:
#718	provider: aws_bedrock
#719	config:
#720	model: amazon.titan-text-express-v1
#721	# check notes below for model_kwargs
#722	model_kwargs:
#723	temperature: 0.5
#724	topP: 1
#725	maxTokenCount: 1000
#726	```
#727	</CodeGroup>
#728
#729	<br />
#730	<Note>
#731	The model arguments are different for each providers. Please refer to the [AWS Bedrock Documentation](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/providers) to find the appropriate arguments for your model.
#732	</Note>
#733
#734	<br/ >
#735
#736	## Groq
#737
#738	[Groq](https://groq.com/) is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine.
#739
#740
#741	### Usage
#742
#743	In order to use LLMs from Groq, go to their [platform](https://console.groq.com/keys) and get the API key.
#744
#745	Set the API key as `GROQ_API_KEY` environment variable or pass in your app configuration to use the model as given below in the example.
#746
#747	<CodeGroup>
#748
#749	```python main.py
#750	import os
#751	from embedchain import App
#752
#753	# Set your API key here or pass as the environment variable
#754	groq_api_key = "gsk_xxxx"
#755
#756	config = {
#757	"llm": {
#758	"provider": "groq",
#759	"config": {
#760	"model": "mixtral-8x7b-32768",
#761	"api_key": groq_api_key,
#762	"stream": True
#763	}
#764	}
#765	}
#766
#767	app = App.from_config(config=config)
#768	# Add your data source here
#769	app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap")
#770	app.query("Write a poem about Embedchain")
#771
#772	# In the realm of data, vast and wide,
#773	# Embedchain stands with knowledge as its guide.
#774	# A platform open, for all to try,
#775	# Building bots that can truly fly.
#776
#777	# With REST API, data in reach,
#778	# Deployment a breeze, as easy as a speech.
#779	# Updating data sources, anytime, anyday,
#780	# Embedchain's power, never sway.
#781
#782	# A knowledge base, an assistant so grand,
#783	# Connecting to platforms, near and far.
#784	# Discord, WhatsApp, Slack, and more,
#785	# Embedchain's potential, never a bore.
#786	```
#787	</CodeGroup>
#788
#789	## NVIDIA AI
#790
#791	[NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup.
#792
#793
#794	### Usage
#795
#796	In order to use LLMs from NVIDIA AI, create an account on [NVIDIA NGC Service](https://catalog.ngc.nvidia.com/).
#797
#798	Generate an API key from their dashboard. Set the API key as `NVIDIA_API_KEY` environment variable. Note that the `NVIDIA_API_KEY` will start with `nvapi-`.
#799
#800	Below is an example of how to use LLM model and embedding model from NVIDIA AI:
#801
#802	<CodeGroup>
#803
#804	```python main.py
#805	import os
#806	from embedchain import App
#807
#808	os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx'
#809
#810	config = {
#811	"app": {
#812	"config": {
#813	"id": "my-app",
#814	},
#815	},
#816	"llm": {
#817	"provider": "nvidia",
#818	"config": {
#819	"model": "nemotron_steerlm_8b",
#820	},
#821	},
#822	"embedder": {
#823	"provider": "nvidia",
#824	"config": {
#825	"model": "nvolveqa_40k",
#826	"vector_dimension": 1024,
#827	},
#828	},
#829	}
#830
#831	app = App.from_config(config=config)
#832
#833	app.add("https://www.forbes.com/profile/elon-musk")
#834	answer = app.query("What is the net worth of Elon Musk today?")
#835	# Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies.
#836	# As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors.
#837	# Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio.
#838	```
#839	</CodeGroup>
#840
#841	## Token Usage
#842
#843	You can get the cost of the query by setting `token_usage` to `True` in the config file. This will return the token details: `prompt_tokens`, `completion_tokens`, `total_tokens`, `total_cost`, `cost_currency`.
#844	The list of paid LLMs that support token usage are:
#845	- OpenAI
#846	- Vertex AI
#847	- Anthropic
#848	- Cohere
#849	- Together
#850	- Groq
#851	- Mistral AI
#852	- NVIDIA AI
#853
#854	Here is an example of how to use token usage:
#855	<CodeGroup>
#856
#857	```python main.py
#858	os.environ["OPENAI_API_KEY"] = "xxx"
#859
#860	app = App.from_config(config_path="config.yaml")
#861
#862	app.add("https://www.forbes.com/profile/elon-musk")
#863
#864	response = app.query("what is the net worth of Elon Musk?")
#865	# {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.',
#866	# 'usage': {'prompt_tokens': 1228,
#867	# 'completion_tokens': 21,
#868	# 'total_tokens': 1249,
#869	# 'total_cost': 0.001884,
#870	# 'cost_currency': 'USD'}
#871	# }
#872
#873
#874	response = app.chat("Which companies did Elon Musk found?")
#875	# {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.',
#876	# 'usage': {'prompt_tokens': 1616,
#877	# 'completion_tokens': 34,
#878	# 'total_tokens': 1650,
#879	# 'total_cost': 0.002492,
#880	# 'cost_currency': 'USD'}
#881	# }
#882	```
#883
#884	```yaml config.yaml
#885	llm:
#886	provider: openai
#887	config:
#888	model: gpt-4o-mini
#889	temperature: 0.5
#890	max_tokens: 1000
#891	token_usage: true
#892	```
#893	</CodeGroup>
#894
#895	If a model is missing and you'd like to add it to `model_prices_and_context_window.json`, please feel free to open a PR.
#896
#897	<br/ >
#898
#899	<Snippet file="missing-llm-tip.mdx" />
#900

z6Mkq5mY3JWtxoxUobWcfNHm7AkRubgSWEZTkBVqZXJviFZ5/my-project-public