Overview

NeuralSeek supports the following Large Language Models (LLMs):

Amazon Bedrock - Claude 3 Haiku
Amazon Bedrock - Claude 3 Opus
Amazon Bedrock - Claude 3 Sonnet
Amazon Bedrock - Claude Instant v1.1
Amazon Bedrock - Claude v1.3 (Deprecated)
Amazon Bedrock - Claude v2
Amazon Bedrock - Claude v2.1
Amazon Bedrock - Jurassic-2 Mid
Amazon Bedrock - Jurassic-2 Ultra
Amazon Bedrock - Llama-2-chat 13B
Amazon Bedrock - Llama-2-chat 70B
Amazon Bedrock - Mistral-7B-Instruct
Amazon Bedrock - Mistral-large
Amazon Bedrock - Mixtral-8x7B-Instruct
Amazon Bedrock - Titan Large (Deprecated)
Amazon Bedrock - Titan Text G1 - Express
Azure Cognitive Services - Azure GPT4 Turbo (Preview)
Azure Cognitive Services - GPT3.5
Azure Cognitive Services - GPT4
Azure Cognitive Services - GPT4 (32K)
HuggingFace - Flan-t5-xxl
HuggingFace - Flan-ul2
HuggingFace - Llama-2
HuggingFace - Llama-2-chat
HuggingFace - Mistral-7B-Instruct
HuggingFace - Mixtral-8x7B-Instruct
HuggingFace - MPT-7B-instruct
OpenAI - GPT3.5
OpenAI - GPT3.5 (16K)
OpenAI - GPT4
OpenAI - GPT4 (32K)
OpenAI - GPT4 Turbo (Preview)
Self-Hosted - Flan-t5-xxl
Self-Hosted - Flan-ul2
Self-Hosted - Llama-2
Self-Hosted - Llama-2-chat
Self-Hosted - Mistral-7B-Instruct
Self-Hosted - MPT-7B-instruct
together.ai - Llama-2 Chat 13B
together.ai - Llama-2 Chat 70B
together.ai - Llama-2 Chat 7B
together.ai - llama-2-13b
together.ai - llama-2-70b
together.ai - LLaMA-2-7B-32K-Instruct
together.ai - Mistral-7B-Instruct
together.ai - Mixtral-8x7B-Instruct
watsonx.ai - elyza-japanese-llama-2-7b-instruct
watsonx.ai - Flan-t5-xxl
watsonx.ai - Flan-ul2
watsonx.ai - granite-13b-chat-v1
watsonx.ai - granite-13b-chat-v2
watsonx.ai - granite-13b-instruct-v1
watsonx.ai - granite-13b-instruct-v2
watsonx.ai - granite-20b-multilingual
watsonx.ai - granite-7b-lab
watsonx.ai - granite-8b-japanese
watsonx.ai - jais-13b-chat
watsonx.ai - Llama-2-chat 13B
watsonx.ai - Llama-2-chat 70B
watsonx.ai - llama-3-70b-instruct
watsonx.ai - llama-3-8b-instruct
watsonx.ai - merlinite-7b
watsonx.ai - Mixtral-8x7B-Instruct
watsonx.ai - Mixtral-8x7B-Instruct-v01-q
watsonx.ai - MPT-7B-instruct2

Platform	LLM	Notes
Amazon Bedrock	Claude 3 Haiku	Claude 3 Haiku is Anthropic's fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with speed. Customers will be able to build seamless AI experiences that mimic human interactions. Claude 3 Haiku can process images and return text outputs, and features a 200K context window.
Amazon Bedrock	Claude 3 Opus	Claude 3 Opus is Anthropic's most powerful AI model, with state-of-the-art performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Claude 3 Opus shows us the frontier of what’s possible with generative AI. Claude 3 Opus can process images and return text outputs, and features a 200K context window.
Amazon Bedrock	Claude 3 Sonnet	Claude 3 Sonnet by Anthropic strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It offers maximum utility at a lower price than competitors, and is engineered to be the dependable, high-endurance workhorse for scaled AI deployments. Claude 3 Sonnet can process images and return text outputs, and features a 200K context window.
Amazon Bedrock	Claude Instant v1.1	A faster and cheaper yet still very capable model, which can handle a range of tasks including casual dialogue, text analysis, summarization, and document question-answering.
Amazon Bedrock	Claude v1.3 (Deprecated)	Anthropic's most powerful model, which excels at a wide range of tasks from sophisticated dialogue and creative content generation to detailed instruction following.
Amazon Bedrock	Claude v2	Anthropic's most powerful model, which excels at a wide range of tasks from sophisticated dialogue and creative content generation to detailed instruction following.
Amazon Bedrock	Claude v2.1	Anthropic's most powerful model, which excels at a wide range of tasks from sophisticated dialogue and creative content generation to detailed instruction following.
Amazon Bedrock	Jurassic-2 Mid	Jurassic-2 Mid is AI21’s mid-sized model, carefully designed to strike the right balance between exceptional quality and affordability. Jurassic-2 Mid can be applied to any language comprehension or generation task including question answering, summarization, long-form copy generation, advanced information extraction and many others.
Amazon Bedrock	Jurassic-2 Ultra	Jurassic-2 Ultra is AI21’s most powerful model offering exceptional quality. Apply Jurassic-2 Ultra to complex tasks that require advanced text generation and comprehension. Popular use cases include question answering, summarization, long-form copy generation, advanced information extraction, and more.
Amazon Bedrock	Llama-2-chat 13B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
Amazon Bedrock	Llama-2-chat 70B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
Amazon Bedrock	Mistral-7B-Instruct	Mistral brings capabilities similar to many popular commercial models. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
Amazon Bedrock	Mistral-large	The most advanced Mistral AI Large Language model capable of handling any language task including complex multilingual reasoning, text understanding, transformation, and code generation.
Amazon Bedrock	Mixtral-8x7B-Instruct	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
Amazon Bedrock	Titan Large (Deprecated)	Amazon Titan Foundation Models are pretrained on large datasets, making them powerful, general-purpose models. Use them as is or customize them by fine tuning the models with your own data for a particular task without annotating large volumes of data
Amazon Bedrock	Titan Text G1 - Express	Amazon Titan Text Express has a context length of up to 8,000 tokens, making it well-suited for a wide range of advanced, general language tasks such as open-ended text generation and conversational chat, as well as support within Retrieval Augmented Generation (RAG). At launch, the model is optimized for English, with multilingual support for more than 100 additional languages available in preview.
Azure Cognitive Services	Azure GPT4 Turbo (Preview)	GPT-4 Turbo provides a good balance of speed and capability. The 16K context window version of the model allows for more information to be passed to it, generally yeilding better responses.
Azure Cognitive Services	GPT3.5	GPT-3.5 provides a good balance of speed and capability.
Azure Cognitive Services	GPT4	GPT-4 can often take longer than 30 seconds for a full response. Use caution when using in conjunction with a Virtual Agent platform that imposes a strict timeout.
Azure Cognitive Services	GPT4 (32K)	GPT-4 can often take longer than 30 seconds for a full response. Use caution when using in conjunction with a Virtual Agent platform that imposes a strict timeout. The 32K context window version of the model allows for more information to be passed to it, generally yeilding better responses.
HuggingFace	Flan-t5-xxl	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
HuggingFace	Flan-ul2	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
HuggingFace	Llama-2	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the non-chat version (Llama-2-7b-hf, Llama-2-13b-hf, Llama-2-70b-hf)
HuggingFace	Llama-2-chat	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
HuggingFace	Mistral-7B-Instruct	Mistral brings capabilities similar to many popular commercial models. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
HuggingFace	Mixtral-8x7B-Instruct	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
HuggingFace	MPT-7B-instruct	The mpt-7b-instruct2 model can generate longer text than the Flan models. Use caution, however, as the model is prone to both extreme hallucination and runaway responses. Be sure to set a minimum confidence level to control this. Not reccomended for public usecases.
OpenAI	GPT3.5	GPT-3.5 provides a good balance of speed and capability.
OpenAI	GPT3.5 (16K)	GPT-3.5 provides a good balance of speed and capability. The 16K context window version of the model allows for more information to be passed to it, generally yeilding better responses.
OpenAI	GPT4	GPT-4 can often take longer than 30 seconds for a full response. Use caution when using in conjunction with a Virtual Agent platform that imposes a strict timeout.
OpenAI	GPT4 (32K)	GPT-4 can often take longer than 30 seconds for a full response. Use caution when using in conjunction with a Virtual Agent platform that imposes a strict timeout. The 16K context window version of the model allows for more information to be passed to it, generally yeilding better responses.
OpenAI	GPT4 Turbo (Preview)	GPT-4 Turbo provides a good balance of speed and capability. The 16K context window version of the model allows for more information to be passed to it, generally yeilding better responses.
Self-Hosted	Flan-t5-xxl	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
Self-Hosted	Flan-ul2	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
Self-Hosted	Llama-2	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the non-chat version (Llama-2-7b-hf, Llama-2-13b-hf, Llama-2-70b-hf)
Self-Hosted	Llama-2-chat	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
Self-Hosted	Mistral-7B-Instruct	Mistral brings capabilities similar to many popular commercial models. Mistral is good at joining thoughts across multiple documents. Mistral operates well on single-GPU instances, and is generally stronger than other models in its class. This model is the instruct version.
Self-Hosted	MPT-7B-instruct	The mpt-7b-instruct2 model can generate longer text than the Flan models. Use caution, however, as the model is prone to both extreme hallucination and runaway responses. Be sure to set a minimum confidence level to control this. Not reccomended for public usecases.
together.ai	Llama-2 Chat 13B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
together.ai	Llama-2 Chat 70B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
together.ai	Llama-2 Chat 7B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
together.ai	llama-2-13b	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the non-chat version (Llama-2-7b-hf, Llama-2-13b-hf, Llama-2-70b-hf)
together.ai	llama-2-70b	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the non-chat version (Llama-2-7b-hf, Llama-2-13b-hf, Llama-2-70b-hf)
together.ai	LLaMA-2-7B-32K-Instruct	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the non-chat version (Llama-2-7b-hf, Llama-2-13b-hf, Llama-2-70b-hf)
together.ai	Mistral-7B-Instruct	Mistral brings capabilities similar to many popular commercial models. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
together.ai	Mixtral-8x7B-Instruct	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
watsonx.ai	elyza-japanese-llama-2-7b-instruct	ELYZA-japanese-Llama-2-7b は、 Llama2をベースとして日本語能力を拡張するために追加事前学習を行ったモデルです。
watsonx.ai	Flan-t5-xxl	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
watsonx.ai	Flan-ul2	The Flan models are primarily english-only, and may struggle with joining thoughts across multiple documents. You will find answers tend to be selected from a single source, even when a stitched answer may be better. Flan does suffer from strong hallucinations, so it is recommended to only use Flan for internal usecases and ensure the Semantic Scoring model is on and primary with a minimum confidence level set of at least 10-15%.
watsonx.ai	granite-13b-chat-v1	The Granite series of models are a step ahead of their counterpart t5 and UL2 models. They excel at retrieving correct information from good documentation, and can join phrases from a limited number of documents. They do not have much ability to reason, however. This can be good or bad, depending on your usecase. Use granite to answer a well defined set of questions from good documentation. Granite likes to generate short results, and will create runaway responses if pressed to generate longer than it wants to. Granite will hallucinate if asked questions without a good reference in your knowledgeBase, or that stray too closely to its training data, and may refuse to follow your documentation. Use semantic scoring to block this hallucination.
watsonx.ai	granite-13b-chat-v2	The Granite series of models are a step ahead of their counterpart t5 and UL2 models. They excel at retrieving correct information from good documentation, and can join phrases from a limited number of documents. They do not have much ability to reason, however. This can be good or bad, depending on your usecase. Use granite to answer a well defined set of questions from good documentation. Granite likes to generate short results, and will create runaway responses if pressed to generate longer than it wants to. Granite will hallucinate if asked questions without a good reference in your knowledgeBase, or that stray too closely to its training data, and may refuse to follow your documentation. Use semantic scoring to block this hallucination.
watsonx.ai	granite-13b-instruct-v1	The Granite series of models are a step ahead of their counterpart t5 and UL2 models. They excel at retrieving correct information from good documentation, and can join phrases from a limited number of documents. They do not have much ability to reason, however. This can be good or bad, depending on your usecase. Use granite to answer a well defined set of questions from good documentation. Granite likes to generate short results, and will create runaway responses if pressed to generate longer than it wants to. Granite will hallucinate if asked questions without a good reference in your knowledgeBase, or that stray too closely to its training data, and may refuse to follow your documentation. Use semantic scoring to block this hallucination.
watsonx.ai	granite-13b-instruct-v2	The Granite series of models are a step ahead of their counterpart t5 and UL2 models. They excel at retrieving correct information from good documentation, and can join phrases from a limited number of documents. They do not have much ability to reason, however. This can be good or bad, depending on your usecase. Use granite to answer a well defined set of questions from good documentation. Granite likes to generate short results, and will create runaway responses if pressed to generate longer than it wants to. Granite will hallucinate if asked questions without a good reference in your knowledgeBase, or that stray too closely to its training data, and may refuse to follow your documentation. Use semantic scoring to block this hallucination.
watsonx.ai	granite-20b-multilingual	The Granite series of models are a step ahead of their counterpart t5 and UL2 models. They excel at retrieving correct information from good documentation, and can join phrases from a limited number of documents. They do not have much ability to reason, however. This can be good or bad, depending on your usecase. Use granite to answer a well defined set of questions from good documentation. Granite likes to generate short results, and will create runaway responses if pressed to generate longer than it wants to. Granite will hallucinate if asked questions without a good reference in your knowledgeBase, or that stray too closely to its training data, and may refuse to follow your documentation. Use semantic scoring to block this hallucination.
watsonx.ai	granite-7b-lab	The Granite 7 Billion LAB (granite-7b-lab) model is the chat-focused variant initialized from the pre-trained Granite 7 Billion (granite-7b) model, which is Meta Llama 2 7B architecture trained to 2T tokens.
watsonx.ai	granite-8b-japanese	The Granite 8 Billion Japanese model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model. Pre-training went through 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code. This model is designed to work with Japanese text. IBM Generative AI Large Language Foundation Models are Enterprise-level Multilingual models trained with large volumes of data that has been subjected to intensive pre-processing and careful analysis.
watsonx.ai	jais-13b-chat	Jais-13b-chat is Jais-13b fine-tuned over a curated set of 4 million Arabic and 6 million English prompt-response pairs.
watsonx.ai	Llama-2-chat 13B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
watsonx.ai	Llama-2-chat 70B	Llama-2 brings capabilities similar to many popular commercial models. Llama-2 is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
watsonx.ai	llama-3-70b-instruct	Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks..
watsonx.ai	llama-3-8b-instruct	Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks..
watsonx.ai	merlinite-7b	Merlinite is Mistral fine-tuned by Mixtral using IBM's LAB methodology. Merlinite tends to hallucinate to the extreme, and show difficulty containing its output without running away. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning.
watsonx.ai	Mixtral-8x7B-Instruct	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
watsonx.ai	Mixtral-8x7B-Instruct-v01-q	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. Mistral is good at joining thoughts across multiple documents. It is also highly sensitive. Slight variations in prompt and weighting can have a profound impact on usability of the system. Use extreme caution if applying prompt engineering or weight tuning. This model is the instruct version.
watsonx.ai	MPT-7B-instruct2	The mpt-7b-instruct2 model can generate longer text than the Flan models. Use caution, however, as the model is prone to both extreme hallucination and runaway responses. Be sure to set a minimum confidence level to control this. Not reccomended for public usecases.

💡 LLM choice is available with NeuralSeek’s BYOLLM (bring your own Large Language Model) plan.

💡 LLMs can vary in their capabilities and performances. Some LLM can take up to 30 seconds and longer to generate a full response. Use caution when using in conjunction with a virtual agent platform that imposes a strict timeout.

Configuring an LLM

⚠️ In order to configure an LLM, make sure that you have subscribed to the Bring Your Own LLM (BYOLLM) plan. All other plans will default to NeuralSeek's curated LLM, and this option will not be available.

In NeuralSeek UI, navigate to Configure > LLM Details page, using the top menu.
Click Add an LLM button.
Select the Platform and LLM Selection. (e.g. Platform: Self-Hosted, LLM: Flan-u2)
Click Add.
Enter the LLM API key in the LLM API Key input field.
Review the Enabled Languages (presented as multi-select)
Review the LLM functions available (presented as checkbox)
Click Test button to test whether the API key works.

💡 You must add at least one LLM. If you add multiple, NeuralSeek will load-balance across them for the selected functions that have multiple LLM's. Features that an LLM are not capable of will be unselectable. If you do not provide an LLM for a function, there is no fallback and that function of NeuralSeek will be disabled.