my-project-public

repository

loading code, commits, and activity

repositories

loading repo index

#1	---
#2	title: 📝 Github
#3	---
#4
#5	1. Setup the Github loader by configuring the Github account with username and personal access token (PAT). Check out [this](https://docs.github.com/en/enterprise-server@3.6/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token) link to learn how to create a PAT.
#6	```Python
#7	from embedchain.loaders.github import GithubLoader
#8
#9	loader = GithubLoader(
#10	config={
#11	"token":"ghp_xxxx"
#12	}
#13	)
#14	```
#15
#16	2. Once you setup the loader, you can create an app and load data using the above Github loader
#17	```Python
#18	import os
#19	from embedchain.pipeline import Pipeline as App
#20
#21	os.environ["OPENAI_API_KEY"] = "sk-xxxx"
#22
#23	app = App()
#24
#25	app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)
#26
#27	response = app.query("What is Embedchain?")
#28	# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.
#29	```
#30	The `add` function of the app will accept any valid github query with qualifiers. It only supports loading github code, repository, issues and pull-requests.
#31	<Note>
#32	You must provide qualifiers `type:` and `repo:` in the query. The `type:` qualifier can be a combination of `code`, `repo`, `pr`, `issue`, `branch`, `file`. The `repo:` qualifier must be a valid github repository name.
#33	</Note>
#34
#35	<Card title="Valid queries" icon="lightbulb" iconType="duotone" color="#ca8b04">
#36	- `repo:embedchain/embedchain type:repo` - to load the repository
#37	- `repo:embedchain/embedchain type:branch name:feature_test` - to load the branch of the repository
#38	- `repo:embedchain/embedchain type:file path:README.md` - to load the specific file of the repository
#39	- `repo:embedchain/embedchain type:issue,pr` - to load the issues and pull-requests of the repository
#40	- `repo:embedchain/embedchain type:issue state:closed` - to load the closed issues of the repository
#41	</Card>
#42
#43	3. We automatically create a chunker to chunk your GitHub data, however if you wish to provide your own chunker class. Here is how you can do that:
#44	```Python
#45	from embedchain.chunkers.common_chunker import CommonChunker
#46	from embedchain.config.add_config import ChunkerConfig
#47
#48	github_chunker_config = ChunkerConfig(chunk_size=2000, chunk_overlap=0, length_function=len)
#49	github_chunker = CommonChunker(config=github_chunker_config)
#50
#51	app.add(load_query, data_type="github", loader=loader, chunker=github_chunker)
#52	```
#53

z6Mkq5mY3JWtxoxUobWcfNHm7AkRubgSWEZTkBVqZXJviFZ5/my-project-public