Skip to main content

Questions tagged [llama3]

The tag has no usage guidance.

-5 votes
0 answers
24 views

I want to build RAG application using llama3 and qdrant vector db [closed]

Certainly! Here's a refined version of your request: I am looking to develop a private chatbot using LLAMA3 and Qdrant as the vector data store. Here are my current resources: 16GB RAM, 1TB SSD, and ...
deep learning's user avatar
0 votes
0 answers
24 views

ConnectError: All connection attempts failed when connecting indexing to neo4j database using PropertyGraphIndex from llama3

I am working on knowledge graph and all connection to neo4j browser is a success(using neo4j desktop windows not docker deployed). however with llama3 i am running the same notebooks as in property ...
Kcndze's user avatar
  • 21
0 votes
1 answer
23 views

Does langchain with llama-cpp-python fail to work with very long prompts?

I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...
bibiibibin's user avatar
0 votes
0 answers
22 views

'LlamaForCausalLM' object has no attribute 'max_seq_length'

I'm fine-tuning llama3 using unsloth , I trained my model and saved it successfully but when I tried loading using AutoPeftModelForCausalLM.from_pretrained ,then I used TextStreamer from transformer ...
Sarra Ben Messaoud's user avatar
-1 votes
0 answers
14 views

Got `disk_offload` error while trying to get the LLma3 model from Hugging face

import torch from transformers import AutoModelForCausalLM,AutoTokenizer from llama_index.llms.huggingface import HuggingFaceLLM from accelerate import disk_offload tokenizer = AutoTokenizer....
Vins Shaji's user avatar
0 votes
0 answers
32 views

How should I use Llama-3 properly?

I downloaded the Meta-Llama-3-70B-Instruct model using the download.sh and the url provided by Meta email, and this is all the files in the folder. enter image description here And when I tried to use ...
Joey1205's user avatar
0 votes
0 answers
33 views

How to merge multiple (at least two) existing LlamaIndex VectorStoreIndex instances?

I'm working with LlamaIndex and have created two separate VectorStoreIndex instances, each from different documents. Now, I want to merge these two indexes into a single index. Here's my current setup:...
林抿均's user avatar
0 votes
0 answers
22 views

How to increase maximum limit for completion_tokens in AWS Sagemaker invoke endpoint

I have deployed the meta-llama/Meta-Llama-3-8B-Instruct model using HuggingFaceModel. The model responds with the full output when I make a call using HuggingFaceModel's predictor method. Here is the ...
keerti4p's user avatar
0 votes
0 answers
38 views

Llama-3-70B with pipeline cannot generate new tokens (texts)

I have sucessfully downloaded Llama-3-70B, and when I want to test its "text-generation" ability, it always outputs my prompt and no other more texts. Here is my demo code (copied from ...
Martin's user avatar
  • 11
0 votes
0 answers
23 views

How to configure ollama setup exe from its source code

I was required to install Ollama setup exe from the source code in windows I found the steps as Note: The Windows build for Ollama is still under development. First, install required tools: MSVC ...
Diksha Gupta's user avatar
0 votes
0 answers
45 views

"Sizes of tensors must match" error in Meta-Llama-3-8B-Instruct

I'm trying to use the pre-trained Meta-Llama-3-8B-Instruct LLM from Hugging Face for fine tuning on my own data. As a very first step, I'm just trying to interact with the model as is. My system specs:...
Raul Marquez's user avatar
  • 1,108
0 votes
0 answers
6 views

How Can I Use Run Manager to Stream Response on RetrievalQA?

I'm working with the langchain library and transformers to build a language model application. I want to integrate a CallbackManagerForLLMRun to stream responses in my RetrievalQA chain. Below is the ...
rahul raj's user avatar
0 votes
0 answers
65 views

In Pytorch and Huggingface transformers, why does loading Llama3 to CPU and then using .to use so much more memory than loading with device_map

I've tried loading Huggingface transformers models to MPS in two different ways: llm = AutoModelForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch....
Owen D's user avatar
  • 55
0 votes
0 answers
42 views

Impossible to get replies out of LLama3

As the question says, I am trying to run Llama3 but to no avail. Most people recommend using Ollama, which I cannot use due to personal circumstances (I am running the code on a cluster and I cannot ...
Anonymous's user avatar
0 votes
1 answer
297 views

How to set eos_token_id in llama3 in HuggingFaceLLM?

I wanna set my eos_token_id, and pad_token_id. I googled alot, and most are suggesting to use e.g. tokenizer.pad_token_id (like from here https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/...
yts61's user avatar
  • 1,509

15 30 50 per page