What it does:
Looks for the shape of a vehicle in video feeds. It was trained on a variety of vehicle body styles and shapes.
It was trained on vehicles facing the camera, facing away from the camera, facing sideways, and in a 3/4 view. It is not trained to detect a vehicle when a camera is looking directly downward, such as a camera mounted directly above an automotive assembly line.
Allows you to create alerts, filter event feeds and 24/7 views based on the presence of a vehicle. Allows you to run chained classification and recognition models on vehicle detections.
Consensus Based Architecture In Development Expected in Q4
The vehicle detection model runs currently on local level and was created by Nvidia. By the end of Q4, a person double check in the cloud with a model created by Intel should be implemented.
Interaction with Person Detections
Obviously, whenever a car is moving, it is highly likely that a person is in it. However, in most situations, a camera cannot see that person and it would not be very useful to get a person alert and a vehicle alert for every vehicle detection. So, typically, a vehicle alert or wheeled object detection will not also create a people detection alert, but this can happen, most commonly with a side view of a delivery vehicle because of their lack of a driver / passenger door.
Interaction with Non-Vehicle Wheeled Object Detections
The Non-Vehicle Wheeled Object Detection model includes anything with wheels that does not appear to be a street legal mass production car, truck or van with 4 or more wheels. In general, this means that the difference between the Vehicle Detection Model and the Vehicle Detection Model almost cleanly separate street legal vehicles from other things with wheels. The singular exception is that the vehicle detection model is NOT trained on motorcycles, which are not detected by this model but are detected by the Non-Vehicle Wheeled Object Detection Model. Visually, it is rather hard for a computer to tell the difference between a motorcycle and bicycle, so they get lumped together currently. We hope that when the Chained Classification Model project will make this easier to differentiate bicycles and motorcycles.
In order to detect a vehicle in a video frame, we recommend that you should choose a camera that allows you to see that vehicle at 40x40 pixel or greater. Our team can help you identify what camera can do this at what distances.
A computer vision model will only work as expected when used in the situation that it was developed around. Models only work on their trained use case.
Camera Mounting Height
We recommend no higher than 8 or 9 feet for fixed lens cameras. Varifocal cameras or PTZ cameras will depend on the vertical angle and distance from the camera.
You will want the camera low enough to see the front, back, and sides of a vehicle versus just the top. This will ensure more accurate detections. When a camera is mounted too high and the subject too close to the camera, only part of the person is visible, survail can interpret a person as a vehicle.
With all video analytics subject distance plays an important role. If the vehicle is too far away it will make it difficult to differentiate the vehicle from the background or other objects. Detections are significantly less accurate with less than 80% of a vehicle in the camera's view and almost impossible with less than 40% person in view.
With survail you can detect an object that is as small as 20x20 pixels, but accuracy falls off if the object is not at least 40x40 pixels, so 40x40 is the default minimum object evaluation size. Accuracy increases as you increase the minimum object size. You can determine the minimum object size for survail to evaluate if an object is a person either globally (for all cameras) or with specific per-camera overrides.
Objects must be Mostly in View
Detections are significantly less accurate with less than 80% of an object or person in the camera's view and almost impossible with less than 40% of the object or person in view. When a car or person is mostly hidden behind a wall or only partially visible in the camera view, there won’t always be enough of the object’s outline visible to be able to know what the object is.
Limitation: Objects within Objects
When objects overlap it’s difficult to discern when one object starts and the other ends. Machine learning works by learning the background and then determining what an object is by looking at the outline of its shape.
Although you can create exclusion zones, you still want to position the camera to only see the areas you wish to monitor, without trees or other obstructions in between. This will greatly reduce false detections and notifications.
For example, if you want to be notified when people and vehicles are in your parking lot, position the camera to only see that entrance / exit of. Do not position the camera so that different cars are in the background for the area you want to monitor - the system will see the car in the background and send alerts that may not be of high value to you. Put the camera watching for cars at the entrance - looking away, not at, the cars in the lot.
Limitation: 100 Objects Evaluated per Frame
Computer Vision models for real time video need to move fast. If you need to analyze 15 frames per second, then your analysis can’t take longer than 1/15 of a second. This is why the most popular computer vision frameworks limit the number of objects that can be evaluated, unusually at around 100 objects per frame.
Training Data Limitation
Machine learning models work based on the data you give them, and only that data. The data collected to create the car detector uses 100% outdoor images and videos. Because of this, it can think that certain objects that it has never seen before, such as carts with wheels or chairs / tables with wheels are cars. We do not recommend running the car detector indoors.
Chained Classification and Recognition Modules In Development Expected in Q3/Q4
Vehicle detections are used to select candidates for further detections, license plate detection, license plate recognition, vehicle re-identification, make/model detection, etc.
Ignoring the requirements listed above will result in many of these chained detection having bad data or not being able to run at all. Some chained detections, like determining the make of the vehicle have additional requirements (in this case, being able to see the logo at 20x20 pixels or larger).
This model was made by NVIDIA.