Hello Mat

 找回密码
 立即注册
查看: 5|回复: 0

RAG知识检索

[复制链接]

1357

主题

1593

帖子

10

金钱

管理员

Rank: 9Rank: 9Rank: 9

积分
22825
发表于 昨天 21:45 | 显示全部楼层 |阅读模式
RAG知识检索:
BAAI/bge-small-zh-v1.5 · HF Mirror

  1. from transformers import AutoTokenizer, AutoModel
  2. import torch
  3. # Sentences we want sentence embeddings for
  4. sentences = ["样例数据-1", "样例数据-2"]

  5. # Load model from HuggingFace Hub
  6. tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-large-zh-v1.5')
  7. model = AutoModel.from_pretrained('BAAI/bge-large-zh-v1.5')
  8. model.eval()

  9. # Tokenize sentences
  10. encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
  11. # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
  12. # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')

  13. # Compute token embeddings
  14. with torch.no_grad():
  15.     model_output = model(**encoded_input)
  16.     # Perform pooling. In this case, cls pooling.
  17.     sentence_embeddings = model_output[0][:, 0]
  18. # normalize embeddings
  19. sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
  20. print("Sentence embeddings:", sentence_embeddings)
复制代码


**encoded_input是三个tensor组合的。可以展开:类似这种model_output = model(tensor(encoded_input["input_ids"]), tensor(encoded_input["mask"]), tensor(encoded_input["ids"]))





算法QQ  3283892722
群智能算法链接http://halcom.cn/forum.php?mod=forumdisplay&fid=73
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Python|Opencv|MATLAB|Halcom.cn ( 蜀ICP备16027072号 )

GMT+8, 2026-2-14 03:26 , Processed in 0.162240 second(s), 22 queries .

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表