Visual Model Based Question Bank Retrieval
2024-11-30 23:13:07 # Projects

思路

使用Apple提供的机器学习框架识别图片中的文字,然后使用 Sentence Embeddings 方法计算出和存储在JSON文件中的每一个题目的相似度,最后选择相似度最高的题目,输出题目相应的答案。

具体实现

所需的框架

1
2
3
4
import SwiftUI
import UIKit
import Vision
import NaturalLanguage
  • SwiftUI/UIKit 构建软件的交互页面
  • Vision 识别图片中的文字
  • NaturalLanguage 实现 Sentence Embeddings 获取句子之间的相似度。

ImagePicker获取图像

这里使用UIKit和SwiftUI的混编,因为Apple目前只支持通过UIKit调用相机。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
struct ImagePicker: UIViewControllerRepresentable {
@Binding var image: UIImage?
@Environment(\.presentationMode) var presentationMode

class Coordinator: NSObject, UINavigationControllerDelegate, UIImagePickerControllerDelegate {
let parent: ImagePicker

init(parent: ImagePicker) {
self.parent = parent
}

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
if let uiImage = info[.originalImage] as? UIImage {
parent.image = uiImage
}
parent.presentationMode.wrappedValue.dismiss()
}
}

func makeCoordinator() -> Coordinator {
Coordinator(parent: self)
}

func makeUIViewController(context: Context) -> UIImagePickerController {
let picker = UIImagePickerController()
picker.delegate = context.coordinator
picker.sourceType = .camera
return picker
}

func updateUIViewController(_ uiViewController: UIImagePickerController, context: Context) {}
}

这段代码定义了一个 ImagePicker 结构体,它实现了UIViewControllerRepresentable 协议,用于在 SwiftUI 中集成 UIKit 的 UIImagePickerController,以便我使用相机拍取题目。

1
@Environment(\.presentationMode) var presentationMode 

控制试图的展示与关闭

1
@Binding var image: UIImage? 

与ContentView绑定的变量,传递拍照获取的图像数据

文字识别

在 ContentView中添加变量:

1
@State private var recognizedText: String = "No text recognized"

这里,Apple为开发者提供了便捷的机器学习框架。我们只需要请求摄像头权限,然后就可以直接调用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func recognizeText() {

guard let image = image?.cgImage else { return }

let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
let request = VNRecognizeTextRequest { (request, error) in
if let observations = request.results as? [VNRecognizedTextObservation] {
let recognizedStrings = observations.compactMap { $0.topCandidates(1).first?.string }
DispatchQueue.main.async {
recognizedText = recognizedStrings.joined(separator: "\n")
recognizedText = recognizedText.trimmingCharacters(in: CharacterSet.newlines)
}
} else if let error = error {
DispatchQueue.main.async {
recognizedText = "Error: \(error.localizedDescription)"
}
}
}

do {
try requestHandler.perform([request])
} catch {
recognizedText = "Error: \(error.localizedDescription)"
}
}

其内部结构主要利用了深度学习中的卷积神经网络。

计算文本相似度

这里,我们使用 Sentence Embedding 算法将句子映射到高维向量空间。在这个向量空间中,语义相似的文本会被映射到彼此靠近的位置。

Apple 使用的是余弦相似度计算文本之间的相似度。余弦相似度不是直接计算“距离”,而是计算两个向量之间的夹角的余弦值。

  • 当两个向量完全相反(为-1),
  • 当两个向量不相关(为0),
  • 当两个向量完全相同(为1),

所以,该函数会输出一个 的数值。

定义一个embedding对象

1
let embedding = NLEmbedding.sentenceEmbedding(for: .english)!

返回两个文本的相似度

1
2
3
4
func getSimilarity(a: String, b: String) -> Double{
let distance = embedding.distance(between: a, and: b)
return distance
}

收集数据

JSON题库

作为测试,我暂且只录入三道题目。之后我会从这三个题目中选择其中一个,并测试能否检索出正确的题目。

使用JSON格式可以很方便的管理收录的文本:

1
2
3
4
5
6
7
8
{
"IG Chemistry 1C":{
"paper": "IG Chemistry 1C",
"1": "This question is about mixtures and compounds.(a) The box gives some methods used to separate mixtures.chromatography crystallisation fractional distillation simple distillationChoose methods from the box to answer the following questions.Each method may be used once, more than once or not at all.(i) Identify a method to separate a single food dye from a mixture of food dyes.(b) The diagram represents a molecule.Explain why this molecule is a compound.(2)(c) The molecular formula of another compound is C3H5N3O9 (i) State the number of different elements in C3H5N3O9(1)(ii) Determine the number of atoms in a molecule of C3H5N3O9(Total for Question 1 = 7 marks)",
"2": "2 This question is about rusting.(a) A simplified formula for rust is Fe2O3(i) Name the two substances needed for iron to rust.(2)12(ii) Give the chemical name for rust.(iii) What type of reaction occurs in the rusting of iron?A combustionB neutralisationC oxidationD thermal decomposition(b) Some iron objects are coated with a layer of zinc to prevent rusting. (i) Name this type of rust prevention.(1)(1)(ii) Explain how this type of rust prevention continues to protect iron when the layer of zinc is damaged.(2)(iii) Give two other methods used to prevent iron from rusting.(2)12(Total for Question 2 = 9 marks)",
"3": "3This question is about states of matter.(a) The box gives words relating to changes of state.Complete the table by giving the correct word from the box for each change of state.condensation cooling evaporation freezing melting sublimation(3)Change of stateName of changesolid to liquidsolid to gasliquid to solid(b) When ammonia gas and hydrogen chloride gas mix, they react together to form a white solid called ammonium chloride.The equation for the reaction isNH3(g) + HCl(g) → NH4Cl(s)A teacher soaks a piece of cotton wool in concentrated ammonia solution and another piece of cotton wool in concentrated hydrochloric acid.The teacher places the two pieces of cotton wool at opposite ends of a glass tube at the same time.After several minutes, a white ring of solid ammonium chloride forms.cotton wool soaked in concentrated ammonia solutionwhite ring of ammonium chloridecotton wool soaked in concentrated hydrochloric acid (i) State the name given to the spreading out of gas particles.(1) (ii) State how the diagram shows that the particles of ammonia gas are travellingat higher speeds than the particles of hydrogen chloride gas.(1)(iii) Gas particles travel at high speeds.Give a reason why the white ring of ammonium chloride takes several minutes to form.(1)(iv) Concentrated ammonia solution and concentrated hydrochloric acid are corrosive.Give one safety precaution the teacher should take.(1)(Total for Question 3 = 7 marks)",
},
}

我们定义一个结构体用来存放从JSON文件中读取的数据

1
2
3
4
5
6
struct ExamQuestion: Codable {
let paper: String
let questions: [String: String]
let ms: [String: String]
let n: [String: String]
}

Codable 允许我们轻松地将数据类型(如结构体或类)编码为或解码自外部表示形式(如JSON)

let paper: String 试卷名称
let questions: [String: String] 题目内容
let ms: [String: String] 答案编号
let n: [String: String] 答案图片数量(应对可能出现的多张答案图片的情况)

然后就可以读取JSON文件了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func parseExamQuestions() -> [ExamQuestion]? {
guard let url = Bundle.main.url(forResource: "paperkey", withExtension: "json") else {
print("File not found")
return nil
}

do {
let data = try Data(contentsOf: url)
let json = try JSONDecoder().decode([String: [String: [String: String]]].self, from: data)

var examQuestions: [ExamQuestion] = []

for (paperName, questionsDict) in json {
var questions: [String: String] = [:]
var markSchemes: [String: String] = [:]
var numImage: [String: String] = [:]

for (questionNumber, questionData) in questionsDict {
if let content = questionData["content"] {
questions[questionNumber] = content
}
if let ms = questionData["ms"] {
markSchemes[questionNumber] = ms
}
if let n = questionData["n"] {
numImage[questionNumber] = n
}
}

let examQuestion = ExamQuestion(paper: paperName, questions: questions, ms: markSchemes, n: numImage)
examQuestions.append(examQuestion)
}

return examQuestions
} catch {
print("Error decoding JSON: \(error)")
return nil
}
}

答案图片

为了在App内加载相应题目的答案,我将答案转制成了PNG的图片,并按照一定命名规则存储在项目目录里的 Assets.xcassets 里。

检索并取最大相似度

在 ContentView 结构体中添加五个@State关键字的变量,这样UI视图就会随着变量的改变而改变。

1
2
3
4
5
@State private var paperName: String = ""
@State private var questionNumber: Int = 0
@State private var questionText: String = ""
@State private var questionMs: String?
@State private var numImage: String = "1"

然后编写搜索函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
func searching(text: String) {
if let examQuestions = parseExamQuestions() {
var closestSim: Double = 2.0
var closestPaper = ""
var closestNum = 0
var closestQText = ""
var closestMS = ""
var num = "1"
for examQuestion in examQuestions {
let paperName = examQuestion.paper
let questions = examQuestion.questions
let markScheme = examQuestion.ms
let numImage = examQuestion.n

for (questionNumber, questionText) in questions {
let distance = embedding.distance(between: text, and: questionText)

if distance < closestSim {
closestSim = distance
closestPaper = paperName
closestNum = Int(questionNumber) ?? 0
closestQText = questionText
closestMS = markScheme[questionNumber] ?? "No mark scheme available"
num = numImage[questionNumber] ?? "1"
}
}
}

print(closestPaper)
print(closestNum)
print(closestQText)
print(closestMS)

self.paperName = closestPaper
self.questionNumber = closestNum
self.questionText = closestQText
self.questionMs = closestMS
self.numImage = num
} else {
self.paperName = ""
self.questionNumber = 0
self.questionText = "No match found"
self.questionMs = "No mark scheme available"
self.numImage = "1"
}
}

所有的题目都存储在 ExamQuestion 类型的数组当中。所以,只需要遍历一遍数组和数组内部的字典,就可以得到和扫描出的文本最相近的题目。

显示结果

这里搭建了一个简单的UI来展示拍摄的题目、对应的题目答案以及一些题目信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
var body: some View {
VStack {
if let image = image {
Image(uiImage: image)
.resizable()
.scaledToFit()
.frame(width: 300, height: 300)
} else {
Image(systemName: "photo")
.resizable()
.scaledToFit()
.frame(width: 300, height: 300)
.foregroundColor(.gray)
}
if let questionMs = questionMs {
VStack {
ForEach(0...((Int(numImage) ?? 1) - 1), id: \.self) { index in
let imageName = "\(questionMs)_\(index)"
Image(imageName)
.resizable()
.scaledToFit()
.frame(width: 300, height: 300)
}
}
} else {
Image(systemName: "photo")
.resizable()
.scaledToFit()
.frame(width: 300, height: 300)
.foregroundColor(.gray)
}
Text("Name: \(paperName)")
Text("Number: \(questionNumber)")
Text("Content: \(questionText)")
Text("Markscheme: \(String(describing: questionMs))")
Button("Take Photo") {
isShowingImagePicker = true
}
.padding()
}
.fullScreenCover(isPresented: $isShowingImagePicker, onDismiss: recognizeText) {
ImagePicker(image: $image)
.edgesIgnoringSafeArea(.all)
}
}

成果展示