Skip to main content

All Questions

0 votes
1 answer
19 views

What is the "fast version" of ZFNet referenced in SPPNet and Faster R-CNN papers?

I'm reading old papers: SPPNet: Link Faster R-CNN: Link In both cases, the authors refer to a "fast version of Zeiler and Fergus (ZF) Net"; specifically: In SPPNet: ZF-5: this ...
Papemax89's user avatar
0 votes
0 answers
22 views

Losing Information while resizing the image in Segmentation task using U-net

I'm using U-net architecture to build a segmentation task of image. During training I have image of size 256256 image. It works very well on the segmentation of same size 256256 or near to size 256*...
Akshit Dhillon's user avatar
0 votes
1 answer
26 views

How do I ensure final output shape matches input shape for a semantic segmentation task?

I trying to replicate the semantic segmentation example https://keras.io/examples/vision/oxford_pets_image_segmentation/ but train on my own data. I have 8 labels (7 features + background). My images ...
utx7563yu's user avatar
1 vote
1 answer
300 views

Is vision transformer (ViT) always better than CNN?

The paper - AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE proposed vision transformer and outperformed CNN-based models in many cases. When it comes to sequential data, we ...
Chuck Liu's user avatar
0 votes
0 answers
23 views

Loss MAE when estimating the angle of rotation of an object in an image is stuck at about 90

I am dealing with the problem of estimating the angle of rotation of objects in images. The problem is that the network gets stuck when training at a loss level of about 90. Below is the code for my ...
DamianSz's user avatar
0 votes
1 answer
56 views

Classification: ClassA vs. "everything else"

I am trying to create a neural network for recognizing a particular object. Maybe I am approaching this task from the wrong side, but, in my mind, this task boils down to teaching the network to do a ...
Dmytro Titov's user avatar
1 vote
0 answers
24 views

Suggestions for labeling regression data to improve model accuracy

I'm working on a convolutional neural network that should predict up to 3 (x,y) coordinate pairs representing the waypoints of a concrete path, given an input image. This network will be used to help ...
pmitch's user avatar
  • 11
0 votes
1 answer
74 views

How to handle the case of multiple ground truth boxes having high IOU with the same predicted box?

In single shot detector the matching strategy between ground truth and predicted box starts with the following step: For each ground truth box we are selecting from default boxes that vary over ...
Yandle's user avatar
  • 231
0 votes
1 answer
1k views

What is `Multi-scale` in Multiscale Convolutional Network?

I was reading an article on Deep Learning and came across this term called Multi-scale Neural Network. I fully understand the concepts of convolutional neural network but it is a bit difficult to ...
Aashish Chaubey's user avatar
1 vote
0 answers
22 views

Attention mechanism: Why apply multiple different transformations to obtain query, key, value

I have two questions about the structure of attention modules: Since I work with imagery I will be talking about using convolutions on feature maps in order to obtain attention maps. If we have a set ...
Steve Ahlswede's user avatar
3 votes
2 answers
1k views

Less parameters - in general within ResNets

My question is about the parameters of the ResNet. Why does the network tend to have fewer parameters than the VGG? This would be the case if I got the paper and the summary from Yannic Kilcher ...
bohniti's user avatar
  • 31
0 votes
1 answer
190 views

Training the network with some batch size - code

There is my "training" code below, I wrote it based on one youtube tutorial. I don't understand actually one part: batch_X = train_X[i:i+BATCH_SIZE], batch_y = train_y[i:i+BATCH_SIZE]. How ...
Adolf Miszka's user avatar
1 vote
1 answer
346 views

Feature extraction from sequence of images with Siamese Neural Network

I am trying to train a neural network to recognize certain actions in short movies. Each such movie consists of a fixed number of frames, each frame - the image is of course the same size, after ...
JohnyBe's user avatar
  • 113
1 vote
0 answers
23 views

Machine learning model (neural network or SVM) for unequal feature matrices size

I have feature matrices obtained from visual bags of words model for various dictionary sizes. Example, Nx5, Nx10, …., Nx15000. Where N is the number of samples and 5, 10, …15000 are the visual ...
PManjunatha's user avatar
1 vote
1 answer
337 views

Why are axes-aligned bounding boxes used in object detection

I understand (I think) why in object detection, the result is a rectangle: it is a simple shape that can be defined by 4 variables (2 pairs coords of opposite corners or 1 pair of coords + width and ...
Jan Pisl's user avatar
  • 195

15 30 50 per page
1
2 3 4 5 6