All Questions
Tagged with neural-network computer-vision
77
questions
0
votes
1
answer
19
views
What is the "fast version" of ZFNet referenced in SPPNet and Faster R-CNN papers?
I'm reading old papers:
SPPNet: Link
Faster R-CNN: Link
In both cases, the authors refer to a "fast version of Zeiler and Fergus (ZF) Net"; specifically:
In SPPNet:
ZF-5: this ...
0
votes
0
answers
22
views
Losing Information while resizing the image in Segmentation task using U-net
I'm using U-net architecture to build a segmentation task of image. During training I have image of size 256256 image. It works very well on the segmentation of same size 256256 or near to size 256*...
0
votes
1
answer
26
views
How do I ensure final output shape matches input shape for a semantic segmentation task?
I trying to replicate the semantic segmentation example
https://keras.io/examples/vision/oxford_pets_image_segmentation/
but train on my own data. I have 8 labels (7 features + background). My images ...
1
vote
1
answer
300
views
Is vision transformer (ViT) always better than CNN?
The paper - AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE proposed vision transformer and outperformed CNN-based models in many cases.
When it comes to sequential data, we ...
0
votes
0
answers
23
views
Loss MAE when estimating the angle of rotation of an object in an image is stuck at about 90
I am dealing with the problem of estimating the angle of rotation of objects in images. The problem is that the network gets stuck when training at a loss level of about 90.
Below is the code for my ...
0
votes
1
answer
56
views
Classification: ClassA vs. "everything else"
I am trying to create a neural network for recognizing a particular object. Maybe I am approaching this task from the wrong side, but, in my mind, this task boils down to teaching the network to do a ...
1
vote
0
answers
24
views
Suggestions for labeling regression data to improve model accuracy
I'm working on a convolutional neural network that should predict up to 3 (x,y) coordinate pairs representing the waypoints of a concrete path, given an input image. This network will be used to help ...
0
votes
1
answer
74
views
How to handle the case of multiple ground truth boxes having high IOU with the same predicted box?
In single shot detector the matching strategy between ground truth and predicted box starts with the following step:
For each ground truth box we are selecting from default boxes that vary over ...
0
votes
1
answer
1k
views
What is `Multi-scale` in Multiscale Convolutional Network?
I was reading an article on Deep Learning and came across this term called Multi-scale Neural Network. I fully understand the concepts of convolutional neural network but it is a bit difficult to ...
1
vote
0
answers
22
views
Attention mechanism: Why apply multiple different transformations to obtain query, key, value
I have two questions about the structure of attention modules:
Since I work with imagery I will be talking about using convolutions on feature maps in order to obtain attention maps.
If we have a set ...
3
votes
2
answers
1k
views
Less parameters - in general within ResNets
My question is about the parameters of the ResNet.
Why does the network tend to have fewer parameters than the VGG? This would be the case if I got the paper and the summary from
Yannic Kilcher ...
0
votes
1
answer
190
views
Training the network with some batch size - code
There is my "training" code below, I wrote it based on one youtube tutorial. I don't understand actually one part: batch_X = train_X[i:i+BATCH_SIZE], batch_y = train_y[i:i+BATCH_SIZE]. How ...
1
vote
1
answer
346
views
Feature extraction from sequence of images with Siamese Neural Network
I am trying to train a neural network to recognize certain actions in short movies.
Each such movie consists of a fixed number of frames, each frame - the image is of course the same size, after ...
1
vote
0
answers
23
views
Machine learning model (neural network or SVM) for unequal feature matrices size
I have feature matrices obtained from visual bags of words model for various dictionary sizes. Example, Nx5, Nx10, …., Nx15000. Where N is the number of samples and 5, 10, …15000 are the visual ...
1
vote
1
answer
337
views
Why are axes-aligned bounding boxes used in object detection
I understand (I think) why in object detection, the result is a rectangle:
it is a simple shape that can be defined by 4 variables (2 pairs coords of opposite corners or 1 pair of coords + width and ...