近兩年來(lái)，Python在眾多編程語(yǔ)言中的熱度一直穩(wěn)居前五，熱門程度可見一斑。Python 擁有很活躍的社區(qū)和豐富的第三方庫(kù)，Web 框架、爬蟲框架、數(shù)據(jù)分析框架、機(jī)器學(xué)習(xí)框架等，開發(fā)者無(wú)需重復(fù)造輪子，可以用 Python 進(jìn)行 Web 編程、網(wǎng)絡(luò)編程，開發(fā)多媒體應(yīng)用，進(jìn)行數(shù)據(jù)分析，或?qū)崿F(xiàn)圖像識(shí)別等應(yīng)用。其中圖像識(shí)別是最熱門的應(yīng)用場(chǎng)景之一，也是與實(shí)時(shí)音視頻契合度最高的應(yīng)用場(chǎng)景之一。

　　聲網(wǎng)Agora 現(xiàn)已支持 Python 語(yǔ)言，大家可以通過(guò)點(diǎn)擊「閱讀原文」獲取 Agora Python SDK。我們還寫了一份 Python demo，并已分享至 Github。本文將從 TensorFlow 圖像識(shí)別講起，并講 TensorFlow 與 Agora Python SDK 結(jié)合，在實(shí)時(shí)音視頻場(chǎng)景中實(shí)現(xiàn)圖像識(shí)別。實(shí)現(xiàn)后的效果，如下圖所示。

　　實(shí)時(shí)通話中成功識(shí)別左圖中的人、椅子和顯示器

　　TensorFlow圖片物體識(shí)別

　　TensorFlow是Google的開源深度學(xué)習(xí)庫(kù)，你可以使用這個(gè)框架以及Python編程語(yǔ)言，構(gòu)建大量基于機(jī)器學(xué)習(xí)的應(yīng)用程序。而且還有很多人把TensorFlow構(gòu)建的應(yīng)用程序或者其他框架，開源發(fā)布到GitHub上。所以我們今天主要基于Tensorflow學(xué)習(xí)下物體識(shí)別。

　　TensorFlow提供了用于檢測(cè)圖片或視頻中所包含物體的API，詳情可參考以下鏈接：

　　https://github.com/tensorflow/models/tree/master/research/object_detection

　　物體檢測(cè)是檢測(cè)圖片中所出現(xiàn)的全部物體并且用矩形（Anchor Box）進(jìn)行標(biāo)注，物體的類別可以包括多種，例如人、車、動(dòng)物、路標(biāo)等。舉個(gè)例子了解TensorFlow物體檢測(cè)API的使用方法，這里使用預(yù)訓(xùn)練好的ssd_mobilenet_v1_coco模型（Single Shot MultiBox Detector），更多可用的物體檢測(cè)模型可以參考這里：

　　https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models-coco-models

　　加載庫(kù)

　　# -*- coding:

　　utf

　　-*-

　　import

　　numpy

　　import

　　tensorflow

　　import

　　matplotlib

　　.pyplot

　　plt

　　from

　　PIL

　　import

　　Image

　　from

　　utils

　　import

　　label_map_util

　　from

　　utils

　　import

　　visualization_utils

　　vis_util

　　定義一些常量

　　PATH_TO_CKPT

　　'ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb'

　　PATH_TO_LABELS

　　'ssd_mobilenet_v1_coco_2017_11_17/mscoco_label_map.pbtxt'

　　NUM_CLASSES

　　加載預(yù)訓(xùn)練好的模型

　　detection_graph

　　= tf.

　　Graph

　�。ǎ�

　　with

　　detection_graph.as_default（）:

　　od_graph_def

　　= tf.

　　GraphDef

　�。ǎ�

　　with

　　tf.gfile.

　　GFile

　　（PATH_TO_CKPT,

　　'rb'

　�。�

　　fid:

　　od_graph_def

　　ParseFromString

　�。╢id.read（））

　　.import_graph_def（od_graph_def, name=

　�。�

　　加載分類標(biāo)簽數(shù)據(jù)

　　label_map

　　label_map_util

　　load_labelmap

　�。�

　　PATH_TO_LABELS

　　）

　　categories

　　label_map_util

　　convert_label_map_to_categories

　�。�

　　label_map

　　max_num_classes

　　NUM_CLASSES

　　use_display_name

　　True

　　）

　　category_index

　　label_map_util

　　create_category_index

　�。�

　　categories

　�。�

　　一個(gè)將圖片轉(zhuǎn)為數(shù)組的輔助函數(shù)，以及測(cè)試圖片路徑

　　def

　　load_image_into_numpy_array（image）:

　�。╥m_width, im_height） = image.size

　　return

　　np.array（image.getdata（））.reshape（（im_height, im_width,

　�。�.astype（np.uint8）

　　TEST_IMAGE_PATHS

　　= [

　　'test_images/image1.jpg'

　　'test_images/image2.jpg'

　　]

　　使用模型進(jìn)行物體檢測(cè)

　　with

　　detection_graph.as_default（）:

　　with

　　tf.

　　Session

　�。╣raph=detection_graph）

　　sess:

　　image_tensor

　　= detection_graph.get_tensor_by_name（

　　'image_tensor:0'

　�。�

　　detection_boxes

　　= detection_graph.get_tensor_by_name（

　　'detection_boxes:0'

　�。�

　　detection_scores

　　= detection_graph.get_tensor_by_name（

　　'detection_scores:0'

　　）

　　detection_classes

　　= detection_graph.get_tensor_by_name（

　　'detection_classes:0'

　�。�

　　num_detections

　　= detection_graph.get_tensor_by_name（

　　'num_detections:0'

　　）

　　for

　　image_path

　　TEST_IMAGE_PATHS:

　　image

　　Image

　　.open（image_path）

　　image_np

　　= load_image_into_numpy_array（image）

　　image_np_expanded

　　= np.expand_dims（image_np, axis=

　�。�

　　（boxes, scores, classes, num） = sess.run（

　　[detection_boxes, detection_scores, detection_classes, num_detections],

　　feed_dict

　　={image_tensor: image_np_expanded}）

　　vis_util

　　.visualize_boxes_and_labels_on_image_array（image_np, np.squeeze（boxes）, np.squeeze（classes）.astype（np.int32）, np.squeeze（scores）, category_index, use_normalized_coordinates=

　　True

　　, line_thickness=

　�。�

　　plt

　　.figure（figsize=[

　　]）

　　plt

　　.imshow（image_np）

　　plt

　　.show（）

　　檢測(cè)結(jié)果如下，第一張圖片檢測(cè)出了兩只狗狗

　　實(shí)時(shí)音視頻場(chǎng)景下TensorFlow物體識(shí)別

　　既然Tensorflow在靜態(tài)圖片的物體識(shí)別已經(jīng)相對(duì)成熟，那在現(xiàn)實(shí)場(chǎng)景中，大量的實(shí)時(shí)音視頻互動(dòng)場(chǎng)景中，如何來(lái)做物體識(shí)別？我們現(xiàn)在基于聲網(wǎng)實(shí)時(shí)視頻的SDK，闡述如何做物體識(shí)別。

　　首先我們了解視頻其實(shí)就是由一幀一幀的圖像組合而成，所以從這個(gè)層面來(lái)說(shuō)，視頻中的目標(biāo)識(shí)別就是從每一幀圖像中做目標(biāo)識(shí)別，從這個(gè)層面上講，二者沒有本質(zhì)區(qū)別。在理解這個(gè)前提的基礎(chǔ)上，我們就可以相對(duì)簡(jiǎn)單地做實(shí)時(shí)音視頻場(chǎng)景下Tensorflow物體識(shí)別。

　�。�1）讀取Agora實(shí)時(shí)音視頻，截取遠(yuǎn)端視頻流的圖片

　　def

　　onRenderVideoFrame（uid, width, height, yStride,

　　uStride

　　, vStride, yBuffer, uBuffer, vBuffer,

　　rotation

　　, renderTimeMs, avsync_type）:

　　# 用 isImageDetect 字段判斷前一幀圖像是否已完成識(shí)別，若完成置為True,執(zhí)行以下代碼，執(zhí)行完置為false

　　EventHandlerData

　　.isImageDetect:

　　y_array

　　= （ctypes.c_uint8 * （width * height））.from_address（yBuffer）

　　u_array

　　= （ctypes.c_uint8 * （（width

　　// 2） * （height // 2）））.from_address（uBuffer）

　　v_array

　　= （ctypes.c_uint8 * （（width

　　// 2） * （height // 2）））.from_address（vBuffer）

　　= np.frombuffer（y_array, dtype=np.uint8）.reshape（height, width）

　　= np.frombuffer（u_array, dtype=np.uint8）.reshape（（height

　　// 2, width // 2））.repeat（2, axis=0）.repeat（2, axis=1）

　　= np.frombuffer（v_array, dtype=np.uint8）.reshape（（height

　　// 2, width // 2））.repeat（2, axis=0）.repeat（2, axis=1）

　　YUV

　　= np.dstack（（Y, U, V））[:height, :width, :]

　　# AI模型中大多數(shù)模型都是RGB格式訓(xùn)練，聲網(wǎng)提供的視頻回調(diào)數(shù)據(jù)源是YUV格式，我們做下格式轉(zhuǎn)換

　　RGB

　　= cv2.cvtColor（YUV, cv2.COLOR_YUV2RGB,

　�。�

　　EventHandlerData

　　.image =

　　Image

　　.fromarray（RGB）

　　EventHandlerData

　　.isImageDetect =

　　False

　　（2）Tensorflow對(duì)截取圖片進(jìn)行物體識(shí)別

　　class

　　objectDetectThread（

　　QThread

　�。�:

　　objectSignal

　　= pyqtSignal（str）

　　def

　　__init__（

　　self

　�。�:

　　super

　�。ǎ�.__init__（）

　　def

　　run（

　　self

　�。�:

　　detection_graph

　　EventHandlerData

　　.detection_graph

　　with

　　detection_graph.as_default（）:

　　with

　　tf.

　　Session

　　（graph=detection_graph）

　　sess:

　�。╥m_width, im_height） =

　　EventHandlerData

　　.image.size

　　image_np

　　= np.array（

　　EventHandlerData

　　.image.getdata（））.reshape（（im_height, im_width,

　�。�.astype（np.uint8）

　　image_np_expanded

　　= np.expand_dims（image_np, axis=

　　）

　　image_tensor

　　= detection_graph.get_tensor_by_name（

　　'image_tensor:0'

　�。�

　　boxes

　　= detection_graph.get_tensor_by_name（

　　'detection_boxes:0'

　�。�

　　scores

　　= detection_graph.get_tensor_by_name（

　　'detection_scores:0'

　�。�

　　classes

　　= detection_graph.get_tensor_by_name（

　　'detection_classes:0'

　�。�

　　num_detections

　　= detection_graph.get_tensor_by_name（

　　'num_detections:0'

　　）

　�。╞oxes, scores, classes, num_detections） = sess.run（

　　[boxes, scores, classes, num_detections],

　　feed_dict

　　={image_tensor: image_np_expanded}）

　　objectText

　　= []

　　# 如果識(shí)別概率大于百分之四十，我們就在文本框內(nèi)顯示所識(shí)別物體

　　for

　　i, c

　　enumerate（classes[

　　]）:

　　scores[

　　][i] >

　　0.4

　　object

　　EventHandlerData

　　.category_index[

　　int

　�。╟）][

　　'name'

　　]

　　object

　　not

　　objectText:

　　objectText

　　.append（

　　object

　�。�

　　else

　　break

　　self

　　.objectSignal.emit（

　　', '

　　.join（objectText））

　　EventHandlerData

　　.detectReady =

　　True

　　# 本幀圖片識(shí)別完，isImageDetect 字段置為True，再次開始讀取并轉(zhuǎn)換Agora遠(yuǎn)端實(shí)時(shí)音視頻

　　EventHandlerData

　　.isImageDetect =

　　True

　　我們已經(jīng)將這個(gè) Demo 以及Agora Python SDK 上傳至 Github，大家可以直接下載使用。

　　Agora Python TensorFlow Demo：

　　https://github.com/AgoraIO-Community/Agora-Python-Tensorflow-Demo

　　Agora Python TensorFlow Demo編譯指南：

下載Agora Python SDK （點(diǎn)擊閱讀原文）
若是 Windows，復(fù)制。pyd and .dll文件到本項(xiàng)目文件夾根目錄；若是IOS，復(fù)制。so文件到本文件夾根目錄
下載TensorFlow 模型，然后把 object_detection 文件復(fù)制。到本文件夾根目錄
安裝 Protobuf，然后運(yùn)行：protoc object_detection/protos/*.proto --python_out=.
從這里下載預(yù)先訓(xùn)練的模型（下載鏈接）
推薦使用 ssd_mobilenet_v1_coco 和 ssdlite_mobilenet_v2_coco，因?yàn)樗麄兿鄬?duì)運(yùn)行較快
提取 frozen graph，命令行運(yùn)行：python extractGraph.py --model_file='FILE_NAME_OF_YOUR_MODEL'
最后，在 callBack.py 中修改 model name，在 demo.py 中修改Appid，然后運(yùn)行即可

　　請(qǐng)注意，這個(gè) Demo 僅作為演示使用，從獲取到遠(yuǎn)端實(shí)時(shí)視頻畫面，到TensorFlow 進(jìn)行識(shí)別處理，再到顯示出識(shí)別效果，期間需要2至4 秒。不同網(wǎng)絡(luò)情況、設(shè)備性能、算法模型，其識(shí)別的效率也不同。感興趣的開發(fā)者可以嘗試更換自己的算法模型，來(lái)優(yōu)化識(shí)別的延時(shí)。

　　如果 Demo 運(yùn)行中遇到問題，請(qǐng)?jiān)赗TC 開發(fā)者社區(qū)中反饋、交流，或在 Github 提 issue。

亚洲精品网站在线观看不卡无广告,国产a不卡片精品免费观看,欧美亚洲一区二区三区在线,国产一区二区三区日韩,日本久久久久,日本-区二区三区免费精品,中文字幕日本亚洲欧美不卡

Agora新增支持Python：視頻通話中也可做圖像識(shí)別了

評(píng)論排行

推薦閱讀

專題

大家都在看

CTI論壇會(huì)員企業(yè)