9
Implementation and Evaluation of Computer Vision Prototype for Vehicle Detection

Gabrielle Bakker‐Reynolds, Emre Erturk, Istvan Lengyel, and Noor Alani

Eastern Institute of Technology, School of Computing, Hawke's Bay, New Zealand

9.1 Prototype Setup

9.1.1 Introduction

Vehicle detection is a useful environment understanding process for describing two‐dimensional (2D) and three‐dimensional (3D) scenes in computer vision. It is frequently used for the development of computer vision modules for autonomous cars and robotics. Robust vehicle detection and scene understanding are key tasks for visual sensors (cameras) in autonomous cars for being able to interpret and act within a dynamic environment [1, 2]. Cameras play a significant role in autonomous driving. They are capable of providing rich information including object detection and distances of objects given in traffic scenes. Hence, developing a prototype that is capable of handling large amounts of image data has been a challenging task especially where the prototypes are designed from locally available resources.

Subsequently, this chapter explores how vehicle detection can be applied to support traffic flow analysis with the aim to design and develop a computer vision prototype from available resources during 2020. This chapter also assumes a design‐based approach, composed of two iteration‐based approaches employed to develop a prototype for vehicle detection utilizing a nano‐computer. The first iteration was the phases of environment setup and choice of methods for most iterations. Next were the design and development, implementation, analysis, and redesign of the prototype. The approaches are analyzed according to the accuracy, processing time, cost, and overall suitability.

9.1.2 Environment Setup

Before testing the different detection approaches, the vehicle detection environment was prepared for design and development. To achieve this, the following steps were completed.

  • Step 1. Writing the Jetson Nano OS image to the MicroSD card: The first step was to write the image to the microSD card, completed by downloading the Jetson Nano Developer Kit SD card image available on the NVIDIA website (Getting Started With [3]). Following the download of the image, a graphical program titled Etcher, which allows users to flash operating system (OS) images to SD card or USB drives, was utilized. To achieve this, Etcher was firstly downloaded and installed onto the computer at the computer lab room at EIT. Once installed, the Etcher application was launched, the option “select image” was selected, and the zipped Jetson Nano Developer Kit SD card image file was selected. Once selected, the microSD card was inserted into a USB‐C micro/SD card reader, and the reader was connected to the computer. The Etcher application then prompted the flash command that was selected as shown below. Once the image was written, the SD card was then ejected via the Files application, and the microSD card was removed from the computer.
  • Step 2. Setup: Following, the NVIDIA Jetson Nano Developer Kit was setup. To achieve this, the developer kit was unpackaged and placed on top of the developer kit box. The microSD card with the newly written OS image was inserted into the slot located underneath the Jetson Nano module. Next, a USB keyboard and mouse were connected, alongside connecting to the computer monitor via an HDMI cable. For livestreaming capabilities, the Raspberry Pi V2 camera module was also connected to the Jetson Nano. Lastly, the Jetson Nano was connected via a 5V2A Micro USB power supply, prompting the Jetson Nano to power on and boot automatically. A green LED light lit up upon startup indicating that it had powered on correctly. See below for a photo of the complete Jetson Nano Developer Kit module (see Figure 9.1).
  • Step 3: First boot: As this was the first time that the Jetson Nano Kit was powered on, it required an initial setting up of the configurations. This was demonstrated by the Jetson Nano prompting review and acceptance of the NVIDIA Jetson software end‐user license agreement (EULA) and requests for the user to select their system language, layout of keyboard, and time zone. Once preferences were set, username and password were required to be created, and the final step was to log in to the system.
  • Step 4: Choice of analysis: Once the Jetson Nano Kit was set up, the way the collected data would be analyzed was posited. After researching into various methods of analysis, the formula below was used to calculate the averages for both the accuracy and processing time of the images, MP4 file, and livestreaming for all iterations excluding the last iteration outlined later in this chapter. This formula was chosen as it calculates the accuracy of the model with a 95% confidence interval. The formula used to calculate the accuracy of the models (using the CPU Confidence intervals) is
    (9.1)equation

    where n is the number of observations and s and t are the standard deviation and t‐test value, respectively.

Photo depicts the Jetson Nano Developer Kit after setup.

Figure 9.1 The Jetson Nano Developer Kit after setup.

Source: NVIDIA.

9.2 Testing

9.2.1 Design and Development: The Default Model and the First Iteration

The decision was then made to clone and install the repository titled “jetson‐inference” located on GitHub [4]. For the first iteration of this approach, the offered object detection demo was tested out. This demo object detector utilizes a deep neural network approach called DetectNet, and the default model used is the single‐shot multibox detector (SSD) MobileNet v2 trained on the common objects in context (COCO) dataset. To use DetectNet, the project was built from source by adhering to the instructions located within the repository. Subsequently, the following steps were taken. First, both Git and CMake were installed on the Jetson Nano. Following, cloning of the repository was completed, and the directory was changed to point toward the repository. A build directory was then created, and CMake was run within the build directory to configure it. Once these commands were input into the terminal, it prompted a downloader tool to appear within the terminal interface. Only SSD MobileNet was downloaded from this. After the model was selected, a downloader tool titled “Pytorch Installer” appeared. The downloader tool can be used to install the software library Pytorch onto the Jetson Nano. Pytorch's primary function within the jetson‐inference repository is to enable transfer learning to retrain neural networks. As transfer learning is looked into in the subsequent iteration, Pytorch was installed. Furthermore, as the Jetson Nano was running Jetpack 4.4, the Python 3.6 version of Pytorch 1.1.0 was installed. The final step was to compile the project build. To achieve this, it was critical to ensure that the directory was still located within the correct directory within the terminal window. Once the project was cloned, installed, and configured, it was then possible to commence testing of the sample object detection model.

9.2.2 Testing (Multiple Images)

The first test concerned itself with assessing the performance of the model on multiple images of vehicles. To achieve this, a sequence of vehicle images was processed, completed by launching the DetectNet model with the path to the directory containing the vehicle image data. Moreover, it was important to test on a significant number of images to assess whether there was consistency in the results. Subsequently, the model was run on 100 images of vehicles taken from the Open Images V6 dataset. More information about this dataset is provided later in this chapter. To test the model on the vehicle data, three parameters were set within the command, including specifying the model, the image input, and where to save the image results.

9.2.3 Analysis (Multiple Images)

  • Accuracy: When launching the command, the image results were saved to the specified directory. At the same time, additional information was output via the terminal, including the class identification number, bounding box coordinates, and confidence values. Upon a check of the output directory, each one of the vehicle images appeared to offer a colored bounding box overlaying the image and accompanying confidence values. To recognize the average, the first 25 images of the 100 images were taken and input into a table using Microsoft Excel. Some of the image data offered more than one datapoint, making it a challenge to calculate an average across the sample size. Due to this, images that included a singular vehicle were counted. Taking this into account, additional 12 vehicle images were included to make up a sample size of 25, and if any of the images included more than one vehicle, they were skipped. Once the datapoints were input in Microsoft Excel, the average and a 95% confidence interval were calculated, giving the average of 84.65 ± 0.05.
  • Timing: The model was evaluated in relation to processing time. Assessment was completed by evaluating the timing report output via the terminal upon image processing. As the terminal outputs a timing report for each image that it processes, a sample size of the first 25 images was taken from the 100 images. The analysis took into account the total time it took for the CPU and compute unified device architecture (CUDA) to process each image. Once the datapoints were input in Microsoft Excel, the average and a 95% confidence interval were calculated for both CPU and CUDA processing. For CPU, the average processing time was 49.33 ± 5.65%. For CUDA, the average processing time was 55.85 ± 5.28%. Table 9.1 illustrates the datapoints obtained, alongside the averages and the averages using the formula.

The costs of the current approaches were calculated by looking at the costs associated with traffic counters created by the business MetroCount, as these are the products being used currently within the Hawke's Bay region to conduct traffic flow analysis. Furthermore, to conduct traffic flow analysis using the pneumatic tubes, specifically for using the RoadPod VT4 product, MetroCount states that two road survey field kits are required [5]. Each road survey field kit includes road nails, road cleats, road rubber flaps, and rubber road tubes. Although the RoadPod VT4 is a standard product created by MetroCount for traffic analysis, there is no public information detailing its cost. As people are employed to count traffic within the Hawke's Bay region, this was also factored in under the current approaches taken to traffic flow analysis. It was estimated that individuals are paid the New Zealand minimum wage to carry out this task. A list of the components is shown in Table 9.2.

9.2.4 Testing (MP4 File)

An open‐source MP4 file of vehicles was also run on the model. The stock photos and video website Pexels were searched to find a suitable video file to fit this purpose. One video file was selected for use titled “daylight traffic on a camera angled tilt road.” This video was composed of 24 seconds of footage of different vehicles travelling across a road [6]. Once selected, the video file was then downloaded off of the Pexels website and saved locally under the “jetson‐inference” directory. The terminal was then launched and commands were input.

Analysis (MP4 File)

  • Accuracy: This was assessed by evaluating the output confidence values associated with the vehicle detections. To calculate an average, eight datapoints were taken from the output generated via the terminal and input into Microsoft Excel. The first datapoint output via the model was considerably higher; however, this was to be expected when starting up the model. Due to this, this datapoint was considered an outlier and so was made exempt when calculating the average. Once the datapoints were input within Microsoft Excel, the average and 95% confidence interval were calculated, giving the average of 84.79 ± 0.0821.

    Table 9.1 Datapoints obtained alongside the average and average using the formula.

    Approach 1, Iteration 1: default model – images
    ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
    174.90114.9114.5
    251.8049.5764.5
    380.3046.650.39
    488.6046.551
    585.6046.854.1
    689.5047.254.6
    788.0046.955.3
    894.4047.955.3
    968.4046.251.3
    1098.6045.756.2
    1187.404654.5
    1298.7046.352.8
    1358.7045.846.7
    1491.1046.455.2
    1586.3046.856
    1662.0046.146.2
    1787.3047.254.4
    1896.6046.553.2
    1995.0045.750.9
    2097.3046.651.2
    2195.3046.352.1
    2280.5047.160.3
    2396.3045.552.7
    2492.9046.552.6
    2570.8046.250.2
    Average:84.6549.330855.8476
    Average using formula:84.65 ± 0.0549.33 ± 5.6555.85 ± 5.28

    Table 9.2 Comparison of the prototype and current approaches.

    The vehicle detector for traffic flowThe current traffic flow approach
    NVIDIA Jetson Nano Developer Kit$169.7512 × 70 mm road nails (×2)See below
    Raspberry Pi V2 camera module$42.6010 × Fig. 8 road cleats (×2)
    Ethernet cable$15.002 × road rubber flaps (×2)
    60GB microSD card$60.001 × 30 m natural rubber road tube (×2)$324.50
    Micro USB power supply cable$20.00Employees$18.80 P/H
    Micro USB power supply wall charger$20.00
    HDMI cable$20.00
    Mouse$10.00
    Keyboard$20.00
    Monitor$200.00
    Total cost:$327.25Total cost:$649.00
  • Timing: The performance of the model was assessed by evaluating 14 datapoints offered via the terminal. Seven datapoints were concerned with model processing timing for CPU, whereas the other seven offered model processing timing for CUDA. To remain consistent with the accuracy assessment, the first datapoint was made exempt when calculating the average timing for the model as it was higher than all of the following datapoints. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated for both CPU and CUDA processing. For CPU, the average processing time was 56.3 ± 2.24 ms. For CUDA, the average processing time was 47.21 ± 2.36 ms. Table 9.3 illustrates the datapoints obtained, alongside the averages and the averages using the formula.

Table 9.3 Datapoints obtained alongside the average and average using the formula.

Approach 1, Iteration 1: default model – MP4 file
ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
188.5054.946.4
268.2056.245.7
384.7054.345.4
476.8055.246.2
596.6054.145.1
686.3059.951.5
792.4059.550.2
Average84.7956.347.21
Analysis using formula84.79 ± 0.082156.3 ± 2.2447.21 ± 2.36

9.2.5 Testing (Livestream Camera)

The final test analyzed the performance of the model in relation to its livestream capabilities for real‐time vehicle detection. To achieve this, the Jetson Nano was setup within an outdoor environment. There were no changes in the outdoor setup than in the previous iterations; however an Ethernet cable that offered a longer cord alongside an extension lead was utilized. The Jetson Nano was then tested on a singular car, moving at 5 km/h. An image illustrating the Jetson Nano after setting up can be evidenced in Figure 9.2.

The Analysis (Livestream Camera)

  • Accuracy: As the camera was pointed toward the vehicle, it started to pick up detections. Initially, the camera was not angled straight and sat on a slight slope, detecting the moving vehicle as incorrect miscellaneous objects. Once the camera position was corrected by being pointed directly at the camera, it then appeared to register the vehicle as a car, outputting this data via the terminal. With the model operated by outputting data every few frames, and as the vehicle moved closer to the camera, the confidence values increased. To calculate the average across the livestreaming of the vehicle, the first 10 datapoints were taken from when the start of where the model began to register the object class correctly. Once the datapoints were input within Microsoft Excel, the average and 95% confidence interval were then calculated, giving the average of 96.97 ± 0.0462%.
  • Timing: Accompanying each output was a timing report, offering the average timing in milliseconds across CPU and CUDA real‐time processing. The associated timing datapoints along with the accuracy datapoints were obtained and input within a table in Microsoft Excel. Table 9.4 offers the data obtained and used to calculate the timing for both the CPU and CUDA. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated for both CPU and CUDA processing. For CPU, the average processing time was 52.29 ± 0.638 ms. For CUDA, the average processing time was 50.03 ± 1.14 ms. Table 9.4 illustrates the datapoints obtained, alongside the averages and the averages using the formula.
  • Redesign: Although this iteration yielded sufficient results for vehicle detection, it was posited whether utilizing the method of transfer learning could heighten accuracy or optimize time efficiency even more so. To ensure a fair comparison, the following iteration would make use of the existing model, be tested using the same images and MP4 file, and include the same livestream setup.
Photo depicts Livestream setup of the Jetson Nano.

Figure 9.2 Livestream setup of the Jetson Nano.

9.3 Iteration 2: Transfer Learning Model

9.3.1 Design and Development

The SSD MobileNet was then retrained with Pytorch through using vehicle images selected from the Open Images V6 database via transfer learning. This approach does not require the need for the model to be trained from the beginning, making it very time efficient [7]. As this section focuses on the design and development of the prototype, the SSD MobileNet and the Open Images V6 dataset will both be examined below briefly.

Table 9.4 Datapoints obtained alongside the average and average using the formula.

Approach 1, Iteration 1: default model – livestream camera
ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
19651.950.1
297.152.446.57
396.452.750.4
496.653.852.3
596.35148.9
696.852.149.5
797.753.551.7
897.851.749.7
997.752.551
1097.351.350.1
Average96.9752.2950.027
Analysis using formula96.97 ± 0.046252.29 ± 0.63850.03 ± 1.14

Combining the SSD‐300 SSD with the support of a MobileNet backbone, the SSD MobileNet object detector is a standard neural network architecture designed to be used for real‐time object detection [7]. First revealed in a 2017 paper at Cornell University, MobileNets can be understood as convolutional neural networks specifically designed for use on mobile and embedded devices [8]. They are based on a streamlined architecture that use depth‐wise divisible convolutions to create lightweight neural networks (see Figure 9.3). Furthermore, MobileNets employ two hyper parameters that introduce a balance between latency and accuracy of results [8]. The SSD‐MobileNet base model used for transfer learning was pretrained on the PASCAL visual objects class (VOC) dataset, a significantly large dataset used as a benchmark in object category recognition and detection.

The Open Images dataset is a dataset composed of over nine million images [9]. Each of these images has been annotated with corresponding image level labels and offers additional features such as bounding boxes and segmentation masks for each object, as well as mapping out visual relationships. Furthermore, the Open Images V6 dataset contains 16 million bounding boxes produced for 600 object classes trained on 1.9 million images. This makes the Open Images V6 dataset the biggest present‐day dataset offering object location annotations [9]. Moreover, it should also be noted that the bounding boxes around each object have been manually drawn by professional annotators to ensure high levels of accuracy and reliability when using the dataset. The primary strength in mapping out visual relationships is that it supports visual relationship detection, a developing task involving structured reasoning [10].

Schematic illustration of architecture of the SSD MobileNet object detector model.

Figure 9.3 Architecture of the SSD MobileNet object detector model [7].

To retrain the SSD MobileNet, a few steps had to be completed first. In order, these steps included setting up the Jetson Nano, downloading the necessary data, limiting the amount of data used, training the SSD MobileNet model, and converting the SSD MobileNet model to open neural network exchange (ONNX) format. Below, each of these steps will be discussed.

Firstly, to set up the Jetson Nano, the Pytorch training code available within the previously downloaded repository was utilized. However, before the code could be used, the base model SSD MobileNet was obtained, and additional python packages were installed [4]. Following, data from the Open Images V6 dataset was downloaded. However, since the Open Images V6 dataset contains large quantities of data, it was important to limit the amount of data so that there was sufficient space on the Jetson Nano. Time efficiency was also taken into account, and so this iteration concerned itself with using a small amount of data to retrain the model. Consequently, there was the option to run a Linux command titled “stats only” specifying the required object classes prior to download of the images. This option also showed the number of images available for download under each class before downloading them.

Prior to selecting the object classes, the list of object classes available for download was viewed on the Open Images V6 dataset website. Upon viewing the list, five object classes were selected for training, including “car,” “vehicle,” “vehicle registration plate,” “truck,” and “motorcycle.” Once downloading the object class statistics surrounding these object classes, it showed that there was a total of 116 806 number of images available for download, alongside a total bounding box count of 317 790.

The total number of images were then divided into three categories including training image data, validation image data, and testing image data, with additional information given on the number of images of object classes that were available from the total categories. Once discovering that there was a total of 93 854 images available to train the model, the number of images selected from this number for download was minimized to 2500. To download the specified number of images, the syntax for the command remained the same, but the “stats only” parameter was replaced with the “max images” parameter. This was imperative because without detailing the maximum number of images required for download, all of the images would have been downloaded by default that could overload the Jetson Nano. Once the specified number of images were downloaded, the SSD MobileNet model was able to commence training. To achieve this, the python script was run using the appropriate command argument options. The full list of options that could be chosen from can be evidenced in Table 9.5.

The batch size hyperparameter command identifies the number of samples that are processed through the neural network prior to the model being updated [11]. Conversely, the epoch hyperparameter command specifies the number of complete passes through the image training data [12]. For this iteration, the number of batch sizes was set to the number 4, and the number of epochs was set to 80 to ensure training did not use up all available memory or take a long period of time. Once the commands were input, the Jetson Nano was left to complete training. Once training was completed, the final step was to convert the model to ONNX format. To convert the model, it was important to ensure that the directory was still located within the “SSD” subdirectory of the jetson‐inference repository. Once the model accomplished conversion to ONNX format, the development phase of the model was completed.

Table 9.5 List of command options for training the SSD MobileNet model [48].

ArgumentDefaultDescription
–datadata/The location of the dataset
–model‐dirmodels/Directory to output the trained model checkpoints
–resumeNonePath to an existing checkpoint to resume training from
–batch‐size4Try increasing depending on available memory
–epochs30Up to 100 is desirable but will increase training time
–workers2Number of data loader threads (0 = disable multithreading)

9.3.2 Test (Multiple Images)

Once the design and development phase were completed, the model was then tested via the terminal. Once the commands were input, it prompted a pop‐up box of the image results to appear in consecutive order, alongside output results via the terminal.

9.3.3 Analysis (Multiple Images)

  • Accuracy: Upon assessing the results of the images, a manual look illustrated that the model was able to detect the vehicles in the images. A total of 100 images were run on the model. From this, a sample size of 25 was taken from the total amount to provide the average confidence value across the results. The rationale behind calculating an average confidence value was to understand and compare the performance of the model against the previously tested working default model. Similar to the above iteration, it was planned for the data to be input into Microsoft Excel, multiplied, and then divided by the total sample size. However, the results provided images that included more than one detection within the image data, making it difficult to calculate an average. Therefore, images with more than one confidence value of vehicles were excluded to provide an accurate average. To make up for the excluded data, additional five vehicle images were included to maintain the sample size, counting on from 25. If the image included more than once confidence value, it was skipped. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated. This gave an average of 91.19 ± 0.043%.
  • Timing: Corresponding to the above working iteration, an assessment of model timing was completed by evaluating the timing report output via the terminal upon image processing. The sample size of the first 25 images was taken from the 100 images, taking into account the total time it took for the CPU and CUDA to process each image. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated for both CPU and CUDA processing. For CPU, the average processing time was 30.608 ± 1.64%. For CUDA, the average processing time was 38.068 ± 2.02%. Table 9.6 illustrates the datapoints obtained, alongside the averages and the averages using the formula.

9.3.4 Test (MP4 File)

The same MP4 file from Pexels utilized in the previous iteration was used for testing. To commence testing, the terminal was launched, and the directory was pointed toward the jetson‐inference directory. Upon testing, the MP4 file opened up and started automatically playing. When vehicles appeared within the MP4 file, bounding box and confidence value for each vehicle were present.

Table 9.6 Datapoints obtained alongside the average and average using the formula.

Approach 2, Iteration 1: transfer learning model – images
ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
198.5046.954.7
293.8035.948.6
365.1033.538.6
499.7034.641.2
596.9033.440.2
690.0028.735.4
790.8029.139.6
883.4028.835.4
999.3028.735.3
1098.6029.135.7
1186.3029.237.5
1268.8029.236.2
1395.0028.429.2
1482.6029.539.3
1597.7028.736.1
1693.303037.8
1787.602836.6
1865.8029.937
1997.102934.3
2098.8030.336.5
2198.7028.740.6
2299.3029.237.1
2393.502936.1
2499.6028.433.4
2599.502939.3
Average91.1930.60838.068
Average using formula91.19 ± 0.04330.608 ± 1.6438.068 ± 2.02

9.3.5 Analysis (MP4 File)

  • Accuracy: The model was assessed by evaluating the output confidence values associated with the vehicle detections to determine accuracy. Eight confidence values were taken from the output generated via the terminal and input into Microsoft Excel to calculate an average. Similar to the previous iteration, the confidence value output via the model was considerably higher; however, this was to be expected when starting up the model. Owing to this, the datapoint was considered an outlier and was not counted when calculating the average. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated, giving an average of 88.91 ± 0.043%.
  • Timing: 14 datapoints were then assessed to offer an average of the timing across the model. Seven datapoints were concerned with model processing timing for the CPU, and the other seven were concerned with offering model processing timing for CUDA. The first datapoint was made exempt when calculating the average timing for the model as it was higher than all of the following datapoints. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated for both the CPU and CUDA processing. For CPU, the average processing time was 44.6 ± 9.80 ms. For CUDA, the average processing time was 33.4 ± 7.88 ms. Table 9.7 illustrates the datapoints obtained, alongside the averages and the averages using the formula.

9.3.6 Test (Livestream Camera)

The final test evaluated the model within a real‐time context. To ensure lightening and weather conditions remained consistent, this test was also was recorded on the same day as the test carried out for the first iteration.

9.3.7 Analysis (Livestream Camera)

  • Accuracy: This test was a reproduction of the one carried out for the first iteration with the default model, with the only exception being that the code employed to test out the model within the terminal differed. Consequently, the vehicle was driven across the camera frame in the same manner, and the camera was angled in the same position. Once the datapoints were input within Microsoft Excel, the average and 95% confidence interval were then calculated, giving the average of 99.17 ± 0.243%.
  • Timing: The average time it took to detect the model was calculated by obtaining data from timing report output via the terminal. Once the datapoints were input in Microsoft Excel, the average and 95% confidence interval were calculated for both the CPU and CUDA processing. For CPU, the average processing time was 35.278 ± 1.065 ms. For CUDA, the average processing time was 33.869 ± 1.041 ms. Table 9.8 illustrates the datapoints obtained, alongside the average and the averages using the formula.

Table 9.7 Datapoints obtained alongside the average and average using the formula.

Approach 1, Iteration 1: transfer learning model – MP4 file
ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
189.006852.2
286.0045.634.5
392.4038.627.9
489.0040.530.4
588.7039.229.3
688.8040.530.4
788.5039.229.3
Average88.9144.5142933.2857
Analysis using formula88.91 ± 0.04344.6 ± 9.8033.4 ± 7.88

9.3.8 Redesign

Upon the assessment of the development and analysis of the previous working iterations, the research found two improvements for the following iteration. The first improvement has to do with the sample sizes in previous iterations. For instance, earlier in the first iteration, only seven datapoints were used to calculate an average. However, utilizing sample sizes composed of only seven datapoints may limit the accuracy evaluation of the vehicle detections. Consequently, an iteration utilizing a wider range of datapoints may give a more accurate reading of the true accuracy performance of the detection models. Furthermore, the second improvement can be made when addressing how accuracy of the prototype is analyzed. Presently, as demonstrated in the previous iterations, accuracy of the prototype has been analyzed by calculating an average using Microsoft Excel. However, this method does not account for variables such as true positive or false positive vehicle detections. Furthermore, if this information is obtained, it may offer additional insights into the performance of the model, also offering a measure accuracy based upon these additional variables.

Table 9.8 Datapoints obtained alongside the average and average using the formula.

Approach 1, Iteration 1: transfer learning model – livestream camera.
ImageConfidence value (%)CPU processing time (ms)CUDA processing time (ms)
199.335.834.8
299.437.3835.59
399.534.334.1
499.133.334.1
599.237.335.1
699.236.735.5
798.333.631.4
899.334.632.3
99935.533.4
1099.434.332.4
Average99.1735.27833.869
Analysis using formula99.17 ± 0.24335.278 ± 1.06533.869 ± 1.041

9.4 Iteration 3: Increased Sample Size and Change of Accuracy Analysis (Images)

9.4.1 Design and Development

The fourth iteration was concerned with implementing the improvements outlined earlier in this chapter. As the first improvement, focus was placed on retrieving a sample size of 50 datapoints of images for both the default model and the transfer learning model, doubling the sample sizes employed in the previous iterations that focus on using images. The second improvement implemented a confusion matrix on the collected datapoints. A confusion matrix is a table with four different combinations of true and predicted values and was chosen to show the relationship between the observed and predicted datapoints during the performance of the vehicle detections [13]. It is used to assist in getting additional insights. This iteration will be concerned with testing the same image data from both the default model and the transfer learning model.

9.4.2 Testing

The research obtained images of both the default model and the transfer learning model. The same steps as in the first iteration were used, but with a few changes. First, this iteration obtained a larger sample size. Second, because the testing image data remained the same as the image data outlined in the previous iterations, this iteration obtained image data for its sample size by utilizing images numbered from “50” onward within the output folders, contrasting the previous iterations that took image data counting from the first image. This approach to data collection ensured that the data samples for both the default model and the transfer learning model were still tested on the same images; however, the data samples taken differed from the previous ones. Third, the change can be seen from the way each of the 50 datapoints for the default and the transfer learning model was input into Microsoft Excel. Instead of skipping images without a vehicle object class, the image was still counted to assess whether or not the model could correctly identify this. Furthermore, if either of the models detected a vehicle within an image that did not contain a vehicle, this was also counted. Once the data was generated by the models, it was input into Microsoft Excel and arranged into two columns, with one column giving the true value and the other giving the models' predicted value.

9.4.3 Analysis

A confusion matrix was created for both the default model and the transfer learning model's generated output datapoints to perform analysis. The confusion matrix for each of the two models was calculated by manually assigning the obtained datapoints generated into one of four values. These four values included “true positive” (TP), “false positive” (FP), “false negative” (FN), and “true negative” (TN). TP was employed when the predicted value was the same as the true value, FP was used when the model predicted a vehicle that was non‐existent in the true images, FN was used when the model did not predict a vehicle in an image that contained a vehicle, and TN was used when the model did not predict a vehicle in an image that did not contain a vehicle. Table 9.9 demonstrate the two confusion matrixes for both the default model and the transfer learning model using their respective output datapoints.

9.4.3.1 Confusion Matrices

Table 9.9 illustrates that the default model shows the 39 datapoints that were TP, 3 that were FN, 0 that were FP, and 8 that were TN. Correspondingly, Table 9.10 demonstrates that the performance of the transfer learning model yielded different results, showing that 39 datapoints were TP, 3 that were FN, 7 that were FP, and 1 that was TN.

Table 9.9 Confusion matrix for the output of the default model.

Predicted valuesActual values
TP = 39FP = 7
FN = 3TN = 1

Table 9.10 Confusion matrix for the output of the transfer learning model.

Predicted valuesActual values
TP = 39FP = 0
FN = 3TN = 8

9.4.3.2 Precision, Recall, and F‐score

Based on the information provided through the two confusion matrixes, additional insights may be gained through employing metrics that determine the precision, recall, and F‐score of the two models. Firstly, precision can be understood as a way to determine the proportion of positive predictions of a model that are truly correct [14]. As in the formulas below, the precision of a model is calculated by adding the TP and FP rates and dividing the TP by that:

(9.2)equation

Subsequently, precision was calculated for both the default model and the transfer learning model using the above formula. The default model offered a precision of 1.0, and the transfer learning model offered a precision of 0.84. Following, recall of the two models was also calculated to determine the proportion of actual positives predicted correctly by a model [14]. As demonstrated below, the recall of a model is calculated by adding the TP and FN rates and dividing the TP by the sum:

(9.3)equation

Recall was calculated for both the default model and the transfer learning model using the above formula. Both the default model and the transfer learning model offered a recall of 0.93. The last metric calculated was the F‐1 score, calculated to interpret the weighted average of both the calculated precision and recall [15]. As demonstrated below, the F‐1 score can be calculated using the following formula:

(9.4)equation

The F1‐score was calculated for both the default model and the transfer learning model using the above formula. The default model offered an F1‐score of 0.96, and the transfer learning model offered an F1‐score of 0.88.

9.5 Findings and Discussion

The general findings of the research are presented here, examining the results from testing the three media inputs. Overall, the performance of vehicle detection produced different results depending on the approach implemented. For the first approach, vehicle detection was unable to be performed as it was found that object detection implements required up‐to‐date software across all packages and dependencies to effectively function.

Following this line of rationale, the second approach reinforced that employing open‐source code available from GitHub enabled general object detection [16]; however accuracy and speed of the prototype were compromised as the software was not optimized specifically for vehicle detection. Taking this into account, the research found that by employing transfer learning on an already pretrained model, this enabled the prototype to be trained on additional vehicle data for optimization.

The second iteration focused on the increasing batch size and epochs, yielding positive results when implementing the model for testing after it had been trained. Overall, it was revealed that this approach increased the prototypes' detection accuracy and decreased detection time across vehicle detections for all three media inputs tested. The final iteration looked into increasing the sample size and changing to different analysis methods, showing that the default model offered a better precision and F1‐score.

Below, the technical specifications of the elected hardware and software utilized within the research will be reiterated. Following, the two working iterations, composed of the default model and the final transfer learning model, will be discussed in relation to their findings. Lastly, the findings will be discussed in relation to the research questions (as stated in the previous chapter), the suitability of the prototype for traffic flow analysis will be assessed, and possible improvements for future iterations will be discussed.

9.5.1 Findings: Vehicle Detection Across Multiple Images

The default model yielded sufficient results. The data was input into Microsoft Excel, and the average function was used to calculate an average. This same process was completed for all subsequent calculations involving averages across all media inputs in the following tests.

Findings illustrated that on average, the prototype using the default model offered an average 84.65 ± 0.05% across testing of images of vehicles. To accompany this, the average processing time using the Quad‐core ARM A57 CPU was 49.33 ± 5.65 ms, and the average processing time offered by CUDA, using a 128‐core Maxell GPU, was 55.85 ± 5.28 ms. Furthermore, it should be noted all subsequent iterations discussed below utilized the same technical specifications for the approaches as outlined in Chapter 8.

During the second iteration, the transfer learning model offered better results. Overall, the transfer learning model demonstrated an increase in average, with the prototype offering an average of 91.19 ± 0.043% across images during testing. Similarly, the average timing of processing images also decreased. For the CPU, the average processing time was 30.608 ± 1.64 ms. For CUDA, the average processing time was 38.068 ± 2.02 ms.

Taking the above findings into account, the data was subjected to three paired sample t‐tests performed using Microsoft Excel to determine whether there was a significant statistical difference between the two approaches. The first t‐test offered a p‐value of 0.093. As the p‐value was not below 0.05, this demonstrated that there was no significant statistical difference in accuracy detection between the two models. A second t‐test was performed to address whether there was a significant difference in timing processing for the CPU, offering a p‐value of 3.87, and a third t‐test was performed for CUDA, offering a p‐value of 1.55. Overall, the t‐tests showed that there was no statistical difference in timing processing between the two approaches.

Although there was no significant statistical difference between the two models in relation to performance across accuracy and timing, the transfer learning model appeared to offer an increase in average when detecting vehicles by 6.54%. The transfer learning model also offered a decrease in timing across both CPU and CUDA processing time.

To measure the relative timing performance of the default model and transfer learning model, the speedup ratio was calculated by dividing the average processing time for the default model by the transfer learning model across both CPU and CUDA [47]. For CPU, 49.33 ± 5.65 ms produced by the default model was divided by the 30.608 ± 1.64 ms produced by the CPU for the transfer learning model. Overall, results showed that the CPU of the transfer learning model provided a 160% speedup over the default model. For CUDA, 55.85 ± 5.28 ms produced by the default model was divided by the 38.068 ± 2.02 ms produced by CUDA for the transfer learning model. Overall, the CUDA of the transfer learning model provided a 147% speed improvement over the default model.

9.5.2 Findings: Vehicle Detection Performance on an MP4 File

As demonstrated earlier in this chapter, the default model prototype offered an average of 84.79 ± 0.0821% across testing of images of vehicles. Furthermore, the average processing time using the CPU was 56.3 ± 2.24 ms, and the average processing time using CUDA was 47.21 ± 2.36 ms.

As evidenced above in the second iteration, the transfer learning model offered improved results when testing on the MP4 file, offering an average of 88.91 ± 0.002%. Similarly, the average timing of processing images also decreased. For CPU, the average processing time was 44.6 ± 9.80 ms. For CUDA, the average processing time was 33.4 ± 7.88 ms.

Taking the above findings into account, the data was subjected to three paired sample t‐tests performed using Microsoft Excel to determine whether there was a significant difference between the two approaches. The first t‐test offered a p‐value of 0.027, showing no statistical difference in accuracy between the two models. A second t‐test was performed to address whether there was a significant difference in timing processing for the CPU, offering a p‐value of 0.034, and a third t‐test was performed to assess CUDA, offering a p‐value of 0.007. These two t‐tests illustrate that there is a significant difference in processing timing between the two models. Furthermore, given that the averages of the CPU and CUDA offered faster processing time for the transfer learning model, the t‐tests reinforce that the transfer learning model performed faster than the default model when processing vehicle detections on an MP4 file.

To measure the relative timing performance of the default model and transfer learning model, the speedup ratio was calculated by dividing the average processing time for the default model by the transfer learning model across both CPU and CUDA [17]. For CPU, 56.3 ± 2.24 ms produced by the default model was divided by the 44.6 ± 9.80 ms produced by the CPU for the transfer learning model. Overall, results showed that the CPU of the transfer learning model provided a 126% speedup over the default model. For CUDA, 47.21 ± 2.36 ms produced by the default model was divided by the 33.4 ± 7.88 ms produced by CUDA for the transfer learning model. Overall, results showed that CUDA of the transfer learning model provided a 140% speedup over the default model.

9.5.3 Findings: Vehicle Detection on Livestream Camera

As demonstrated earlier in this chapter, the default model prototype offered an average 96.97 ± 0.0462 across testing of the camera livestreaming in real time. Furthermore, the average processing time for CPU was 52.29 ± 0.638 ms. For CUDA, the average processing time was 50.03 ± 1.14 ms.

In the second iteration, the transfer learning model offered better results on the livestream camera. This is evidenced by an offered average 99.17 ± 0.243% across testing of livestream camera. Timing was also decreased. For instance, for the CPU, the average processing time was 35.278 ± 1.065 ms. For CUDA, the average processing time was 33.869 ± 1.041 ms.

Taking the above findings into account, the data was subjected to three paired sample t‐tests performed using Microsoft Excel to determine whether there was a significant difference between the two approaches. The first t‐test offered a p‐value of 1.86, showing no statistical difference in accuracy between the two models. A second t‐test was performed to address whether there was a significant difference in timing processing for the CPU, offering a p‐value of 1.14, and a third t‐test was performed for CUDA, offering a p‐value of 1.42.

Although there was no significant statistical difference between the models in relation to performance of accuracy and timing, it should be noted that the transfer learning model still offered an increase in average when detecting the vehicle by 2.2%. Furthermore, the transfer learning model also offered a decrease in timing across both CPU and CUDA processing time. To measure the relative timing performance of the default model and transfer learning model, the speedup ratio was calculated by dividing the average processing time for the default model by the transfer learning model across both CPU and CUDA (Predicting and Measuring Parallel Performance, 2012). For CPU, 52.29 ± 0.638 ms produced by the default model was divided by the 35.278 ± 1.065 ms produced by the CPU for the transfer learning model. Overall, results showed that the transfer learning model offered speedup of 140%. For CUDA, 50.03 ± 1.14 ms produced by the default model was divided by the 33.869 ± 1.041 ms produced by CUDA for the transfer learning model. Overall, results showed that the CUDA of the transfer learning model also provided a speedup of 140% over the default model.

9.5.4 Findings: Iteration 3

As demonstrated above, the final iteration was employed to assess whether additional changes, such as an increase in sample size, and differing analysis methods would offer further insights. To achieve this, the default model and the transfer learning model were tested on 50 images, doubling the sample sizes obtained in the previous iterations concerned with image processing. Furthermore, the two models were assessed in relation to precision, recall, and F1‐scores. The default model offered a precision of 1.0, a recall of 0.93, and an F1‐score of 0.96. Conversely, the transfer learning model offered a precision of 0.84, a recall of 0.93, and an F1‐score of 0.88. Overall, this illustrates that the default model performed better than the transfer learning model in relation to precision and F1‐scores. However, it should be noted that the transfer learning model performed better in relation to average calculated in previous iterations.

Upon assessing the results of the images within the “trafficmodel2” folder, a manual look illustrated that the model was able to detect the vehicles in the images. This is shown in Figure 9.4.

A total of 100 images were run on the model. From this, a sample size of 25 was taken from the total amount to provide the average confidence value across the results. The rationale behind calculating an average confidence value was to understand and compare the performance of the model against the previously tested working default model.

Snapshot of quantitative result analysis using Open Images dataset (V6). The green overlay bounding boxes represents the vehicle detection accuracy using the proposed prototype.

Figure 9.4 Quantitative result analysis using Open Images dataset (V6). The green overlay bounding boxes represents the vehicle detection accuracy using the proposed prototype.

Source: Used with the permission of Microsoft.

9.5.5 Addressing the Research Questions

The research questions were as follows:

  1. (1) How can computer vision be used to support traffic flow analysis?
  2. (2) What is required within the planning, development, and implementation phases of a prototype that performs accurate vehicle detection?
  3. (3) How will the performance of a vehicle‐detection prototype be measured?
  4. (4) What are the barriers to the development of a vehicle‐detection prototype?

Addressing the first question, one of the most critical requirements to ensure an effective prototype creation was the employment of a research methodology that aligned with the research's questions and objectives. For this, the adoption of the design‐based research methodology served as a suitable approach as it underlined the importance of iteration‐based research and offered a phase‐based framework that could be referred to during the planning, development, implementation, and redesign lifecycle. Adherence to the framework provided the research with a structure and ensured that appropriate steps were taken in each phase. Furthermore, performing iterations allowed for the research to learn from previous iterations and, in doing so, improve the prototype.

Additionally, the ability to troubleshoot issues or errors during the testing phases of the prototype was made apparent. It was found that to effectively troubleshoot issues, reliance was often placed on websites that such as GitHub or Stack Overflow as they are very community focused, offering platforms that enable developers to share information surrounding software‐related issues.

Another requirement that proved to be an important component was undergoing enough research into the selection of hardware and software used to develop the prototype. Initially, the hardware for the prototype was selected prior to the choice of the software stack, which was a straightforward process. This was because once the nano‐computer model was selected, the instructions and the accompanying tools were available from the manufacturer's website.

To address the second research question, evaluation criteria were posited to measure the performance of the prototype and were implemented in each analysis section of each iteration. As the prototype focuses on vehicle detection, the most critical evaluation benchmark was ensuring a high level of accuracy when detecting vehicles. Furthermore, to understand whether the prototype held the capabilities of real‐time vehicle detection, the speed at which processing vehicles occurred was another assessment marker set in place. Finally, general assessment of the prototypes' suitability for traffic flow analysis is discussed further below. Also, finally, there were a few different barriers encountered when developing the vehicle detection prototype. The two most prevalent challenges were to do with cost and time. These two factors may have had an impact on the final results of the research. For example, to keep costs down, free and open‐source tools were used when possible.

A final barrier can be addressed by many of the incompatibilities faced when opting to use the Jetson Nano for prototype development. The Jetson Nano is considered a relatively new embedded device. Due to this, the research often faced restrictions underpinning the development of the prototype. Some of the most prevalent restrictions included the lack of working open‐source code and significant conflicts with software popular with similar embedded devices. However, as there is a further increase in development on the Jetson Nano, it is posited that these issues will be improved in the future.

9.5.6 Assessment of Suitability

High accuracy was the most important efficacy marker of the prototype. Taking this into account, it was found that it was crucial that the computer vision model was optimized for vehicles to ensure there was no chance of false positives. Consequently, this line of rationale endorses the transfer learning models' iteration outlined in the section on design and development, as it also offered the highest rate of accuracy. However, the final iteration concerned itself with analyzing both of the two models in relation to precision, recall, and corresponding F1‐scores to generate further insights. Employing these metrics illustrated that the default model performed better than the transfer learning model in this regard.

The prototype did not appear to be hindered by processing timing when tested on vehicle images. The MP4 file and real‐time livestreaming seem to serve as potentially suitable modes of vehicle detection processing for traffic flow analysis. As the prototype was optimized to detect only vehicles across six object classes, this speeded up processing time across all three media inputs. However, though timing of vehicle detection does not appear to pose as an issue, the terminal does not automatically add up the total of vehicles detected, but rather displays them one after the other in order of sequence across any media input, which is not user‐friendly. Future directions for this are discussed in the section on future improvements.

Furthermore, it was found that during the design and development phase of the research, an intermediate level of knowledge was required for successful implementation. This included knowledge surrounding setting up the device and sourcing and implementing compatible software and open‐source code to run it. However, once the prototype was set up, it was a straightforward process to begin running it on the different media inputs. Due to this, it may serve as a suitable option or capturing traffic flow once the initial setup of the prototype has been completed.

The prototype also proved to be a cost‐effective option and appeared to rival costs associated with current approaches implemented for traffic flow analysis. The use of the prototype may extend local governmental use, providing an accessible way for enterprises to seek insights into traffic flow for their business context. Furthermore, the prototype may be suitable from a commercial standpoint because it upholds the privacy of vehicles and the vehicle owners.

Also, the prototype may also be viewed as beneficial from a sustainability standpoint due to its low power consumption. This is significant to note, as the prototype utilizes the Jetson Nano as its embedded device to run the models on; the technical specifications for its small size still boast 128 CUDA cores, 4GB RAM, and Quad‐core ARM Cortex A57 processor [18]. However, when it comes to energy required to run, there is no trade‐off, as the Jetson Nano only requires 5–10 W of power consumption [19].

Another affordance of the prototype is that it offers users a great level of flexibility pertaining to media inputs. For instance, typically, a general use case of the prototype would oversee the utilization of its livestreaming camera capabilities to capture real‐time traffic flow. However, a secondary option presents MP4 recordings. In this way, the prototype may capture traffic flow by recording the vehicles via a camera and then running the recording through the prototype at a later time.

9.5.7 Future Improvements

Based on the assessment of suitability, improvements may be put forth for future iterations. The first recommendation can be evidenced by the choice of hardware, as some of the hardware tools were cost‐effective options. Due to this, future improvements may look to employ hardware that may offer better results. One example of this may be the Raspberry Pi V2 module. Although the Raspberry Pi V2 module serves as an effective instrument in testing of real‐time livestreaming capabilities, there are cameras compatible with the Jetson Nano that offer a higher picture resolution.

Another option to consider for future iterations may be utilizing a wireless WiFi option. In this research, Ethernet cables were employed; however, they may not always be regarded as the most suitable option when looking to employ the prototype within a commercial context. Future improvements may focus on utilizing tools that support wireless capabilities on the nano‐computer. Taking advantage of wireless capabilities presents many additional benefits, such as increasing flexibility of where the devices can be placed, also making the overall setup of the prototype more secure. Furthermore, this would also allow for the monitor attached to the device to be located somewhere remotely.

A final hardware addition may be protective gear for the devices, especially when being subject to weather conditions. This research did not take into account physical durability of the prototype; however, if the prototype is to be implemented within a real‐life traffic flow context, it is imperative that it is properly shielded by weather conditions. Subsequently, employing some form of protective‐like apparatus may aid the prototype in this way.

Future improvements may also look at incorporating a graphical user interface (GUI) in replacement of the terminal. Implementing a GUI may not only increase user‐friendliness but also encourage the employment and overall exploration of the prototype within a local government or enterprise context. Moreover, the development of a GUI tailored toward the prototype could also be involved in automating the process of vehicle detection counting using an application programming interface (API). Overall, these options hold the potential to remove the need in a scroll through the output data reporting on vehicle detections, saving time, and increasing overall usability.

9.6 Conclusion

Overall, the findings of the research illustrate that the prototype may be used as a potential tool in the future to aid traffic analysis. Once the prototype is running, it has demonstrated that it holds the capabilities of detecting vehicles automatically. One of the current modes of traffic counting is by employing people to count vehicles manually. Implementing a tool such as the prototype would enable a higher level of automation and not place as much reliance on people to carry out these tasks. In addition, the technology used for the prototype is more cost‐effective than the current tools used within the local region. Overall, it is important to have greater cloud and IoT adoption regionally [20] in order for smart transportation to be implemented on a large scale.

Within this study, a vehicle detection prototype was developed, tested, and evaluated to serve as a way to focus on the general functionality of computer vision prototype built from locally available resources. The results of the final iteration utilized a transfer learning‐based computer vision model that shows great potential for the locally developed prototype. Further improvements in hardware are recommended before this can be implemented in a real context. These improvements include camera upgrades to offer better resolution for livestream capture, wireless connection to give more flexibility, and protective apparatus to shield the prototype from weather conditions. Alternatively, as suggested by Loresco [21], for intelligent traffic light operation using computer vision with an Android system, traffic data was transmitted via a CCTV to a Raspberry Pi 3 microcontroller.

Another direction for the future may be changing the code used to run the prototype. The Jetson Nano is a relatively new embedded device, and some of the open‐source code will continue to be updated and improved. One possible tool is NVIDIA Metropolis, an edge‐to‐cloud platform supported by SDKs such as Jetpack that can be used to maintain and improve smart cities through the use of artificial intelligence (AI) [22]. Overall, the vehicle detection prototype holds much potential for assisting in conducting traffic flow analysis in the future. Consequently, the findings of this research may inform future investigations into similar domains using embedded devices.

References

  1. 1 Ai, C. and Tsai, Y.J. (2016). An automated sign retroreflectivity condition evaluation methodology using mobile LIDAR and computer vision. Transportation Research Part C: Emerging Technologies 63: 96–113. https://doi.org/10.1016/j.trc.2015.12.002.
  2. 2 Alani, N.H.S. (2019). Improved stixels towards efficient traffic‐scene representations. Doctoral thesis. Auckland, New Zealand: Auckland University of Technology. https://openrepository.aut.ac.nz/handle/10292/12545.
  3. 3 Nvidia Developer (2019). Jetson nano developer kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit (accessed 24 July 2021).
  4. 4 Franklin, D. (2020). Dusty‐nv/jetson‐inference [C++]. https://github.com/dusty-nv/jetson-inference (accessed 24 July 2021).
  5. 5 MetroCount (n.d.). RoadPod. https://metrocount.com/products/roadpod-vehicle-tube-classifier/ (accessed 24 July 2021).
  6. 6 Pexels (n.d.). Daylight traffic on a camera angled tilt road: free stock video. https://www.pexels.com/search/videos/cars%20driving/ (accessed 24 July 2021).
  7. 7 Franklin,  D (2020). jetson‐inference. GitHub. https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-ssd.md.
  8. 8 Howard, A.G., Zhu, M., Chen, B. et al. (2017). MobileNets: efficient convolutional neural networks for mobile vision applications. ArXiv:1704.04861 [Cs]. http://arxiv.org/abs/1704.04861.
  9. 9 Google (2020). Open images V6. https://storage.googleapis.com/openimages/web/factsfigures.html (accessed 24 July 2021).
  10. 10 Kuznetsova, A., Rom, H., Alldrin, N. et al. (2020). The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision 128 (7): 1956–1981. https://doi.org/10.1007/s11263-020-01316-z.
  11. 11 Kandel, I. and Castelli, M. (2020). The effect of batch size on the generalisability of the convolutional neural networks on a histopathology dataset. ICT Express https://doi.org/10.1016/j.icte.2020.04.010 6: 312–315.
  12. 12 Georgevici, A.I. and Terblanche, M. (2019). Neural networks and deep learning: a brief introduction. Intensive Care Medicine 45 (5): 712–714. https://doi.org/10.1007/s00134-019-05537-w.
  13. 13 Nisbet, R., Miner, G., and Yale, K. (2018). Model evaluation and enhancement. In: Handbook of Statistical Analysis and Data Mining Applications, 2e (eds. R. Nisbet, G. Miner and K. Yale), 215–233. Academic Press. https://doi.org/10.1016/B978-0-12-416632-5.00011-6.
  14. 14 Google Developers (n.d.). Classification: precision and recall. https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall (accessed 24 July 2021).
  15. 15 Wood, T. (n.d.). F‐Score. DeepAI. https://deepai.org/machine-learning-glossary-and-terms/f-score.
  16. 16 Bakker, G. (2020). Gbakke1/jetson‐inference. https://github.com/gbakke1/jetson-inference (accessed 24 July 2021).
  17. 17 Gabb, H., Corden, M., Rosenquist, T. et al. (2012). Predicting and measuring parallel performance. Intel. https://software.intel.com/content/www/us/en/develop/articles/predicting-and-measuring-parallel-performance.html?language=en.
  18. 18 Nvidia Developer (n.d.). Hardware for every situation. https://developer.nvidia.com/embedded/develop/hardware (accessed 24 July 2021).
  19. 19 E Linux (n.d.). Jetson nano. https://elinux.org/Jetson_Nano (accessed 24 July 2021).
  20. 20 Erturk, E. (2017). An incremental model for cloud adoption: based on a study of regional organizations. TEM Journal 6 (4): 868–876.
  21. 21 Loresco, P. (2018). Intelligent traffic light system using computer vision with android monitoring and control. TENCON 2018‐2018 IEEE Region 10 Conference, pp. 2461–2466). IEEE. https://doi.org/10.1109/TENCON.2018.8650084.
  22. 22 Nvidia (n.d.). Build smarter cities through AI. https://www.nvidia.com/en-us/industries/smart-cities/ (accessed 24 July 2021).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset