Questions tagged [neural-network]
Artificial neural networks (ANN), are composed of 'neurons' - programming constructs that mimic the properties of biological neurons. A set of weighted connections between the neurons allows information to propagate through the network to solve artificial intelligence problems without the network designer having had a model of a real system.
4,383
questions
0
votes
1
answer
28
views
hacky backprop outperforms clean backprop - Why?
I implemented a basic NN for MNIST in Numpy and started with a hacky implementation of backprop (just randomly multiplying gradients together), but somehow that one works better than my cleaned up ...
-1
votes
1
answer
33
views
How can I select subsets of features using neural network?
This listing selects the best features from the 1000 available columns in a given dataset.
The first three columns are dropped because they are useless data.
The dataset is huge. So, they were read in ...
0
votes
0
answers
45
views
How weight vector behave when we initialize the weight to 0 in case of perceptron
While reading in book i encountered this statement
Now, the reason we don't initialize the weights to zero is that the learning rate (eta) only has an effect on the classification outcome if the ...
1
vote
1
answer
24
views
Everything is classified as background by segmentation model
I am training a U-NET model for medical image segmentation. Problem is that the binary masks that im using to train the model mostly consist of background pixels and a very small region of the whole ...
0
votes
0
answers
18
views
Does it common for LM (hundreds million parameters) beat LLM (billion parameters) for binary classification task?
Preface
I am trying to fine-tune the transformer-based model (LM and LLM). The LM that I used is DEBERTA, and the LLM is LLaMA 3. The task is to classify whether a text contains condescending language ...
0
votes
0
answers
14
views
How to increase the optimial cutoff point(youden index) after training a model?
So I trained a model based on a medical dataset and and I got an AUROC for detecting cancer in brain images as about 0.96 and i noticed that the youden index is 0.1 but i want to increase it to 0.5 , ...
-1
votes
1
answer
8
views
WGAN generating images from the training data
Is it possible for gan to remember somehow training data distribution?
Or maybe somеthing leaks out when I calculate gradients?
...
0
votes
0
answers
25
views
Is it legit to normalize time series with respect to the x-axis?
I have a data set consisting of multivariate time series, e.g. a batch of my data has the shape (batch_size, timesteps, number_input_features) and I want to train a neural network on it to predict ...
1
vote
1
answer
38
views
How does seeing training batches only once influence the generalization of a neural network?
I am referring to this question/scenario Train neural network with unlimited training data but unfortunately I can not comment.
As I am not seeing any training batch multiple times I would guess that ...
0
votes
0
answers
9
views
How to handle sequences with crossEntropyLoss
fist of all i am ne wto the whole thing, so sorry if this is superdumb.
I'm currently training a Transformer model for a sequence classification task using CrossEntropyLoss. My input tensor has the ...
0
votes
0
answers
22
views
What is the most accurate way of computing the evaluation time of a neural network model?
I am training some neural networks in pytorch to use as an embedded surrogate model. Since I am testing various architectures, I want to compare the accuracy of each one, but I am also interested in ...
0
votes
0
answers
9
views
Mobilenet vs resnet
Q1-Why dont we remove relu after addition of skip connection in resnet50 like we do in mobile-net v2 for better performance?
Q2-And why dont we have Convolution layer in skip connection for dimention ...
2
votes
1
answer
22
views
Benchmark Neural Networks on High-Dimensional Functions
For a personal project, I am interested in benchmarking certain neural network architectures in the context of high-dimensional function approximation. Specifically, I am interested in continuous, ...
0
votes
1
answer
19
views
What is the "fast version" of ZFNet referenced in SPPNet and Faster R-CNN papers?
I'm reading old papers:
SPPNet: Link
Faster R-CNN: Link
In both cases, the authors refer to a "fast version of Zeiler and Fergus (ZF) Net"; specifically:
In SPPNet:
ZF-5: this ...
1
vote
0
answers
46
views
Why can't I replicate the results from this paper?
I'm trying to train a model to evaluate chess positions, following the methodology from this paper (note that the author presents several different architectures, but I'm only looking at the ANN with ...
1
vote
1
answer
55
views
wierd neural network approache
I'm working on a problem where I need to create a neural network to optimize the seating arrangement for 24 unique individuals in a 6x4 grid, minimizing conflicts between adjacent (up,down,left,right) ...
2
votes
0
answers
13
views
What's the best way to incorporate momentum and regularization when training a neural network?
I want to implement the momentum algorithm to train a neural network, but I'm uncertain about where the regularization term should be incorporated. For ridge regularization, one option is to have:
$$
...
1
vote
0
answers
9
views
Residual Network Skip Connection Clarification
In ResNets do skip connections get utilised at every step? If not what causes a layer to be skipped vs not skipped?
Thank you,
1
vote
1
answer
34
views
Predicted output is only 0s
I am developing a neural network using Home credit Default Risk Dataset.
The prediction should be between 0.0 and 1.0 but my algorithm's outcome is just 0.0 for every row.
My Code
...
0
votes
0
answers
14
views
Semantics Building In LSTM-Based Models - How does a LSTM is able to extract and represent long data using just one value (long-memory)
How does a LSTM is able to extract and represent long sequences with data while using just one value (long-memory / LM) to maintain all this information?
If multiple value were used, it could be ...
0
votes
0
answers
14
views
Impact of Adding Imbalanced Data on Model Performance for Different Groups
Suppose I initially have a dataset with 50 samples of type A and 50 samples of type B, each with several features. I built a neural network model using this data and recorded the prediction accuracy ...
3
votes
1
answer
233
views
What ML model for regression given tabular AND image data?
I'd like to predict the power production of a windfarm given the wind speed, its direction and other variables related to the specific wind turbines. However, due to wake effects (wind speed decreases ...
1
vote
0
answers
38
views
Class imbalance for binary classification tasks
I am looking to train a binary classifier. Most of my experience so far has been with generative models, not classifiers, so I am wondering with respect to training data, what is a good ratio of 0 and ...
0
votes
1
answer
27
views
How to update first layer weights?
I’m trying to make a neural network without using any deep learning library that recognizes numbers in the mnist database. Its structure is: 784 input neurons (for the 784 pixels in the number images),...
3
votes
1
answer
45
views
Is it legal to use a model found on github for a personal project and uploading the personal project onto github? [closed]
I found a great model I would like to use and make improvements upon for a personal project. It doesn't contain any liscenses nor does it mention anything about restrictions of use.
Are AI models like ...
3
votes
1
answer
29
views
Outputting handwritten digits with a Neural Network
I know that you can use a neural Network to recognize handwritten digits. How would you then use that same neural network to output handwritten digits in the unique style of that network? In other ...
0
votes
0
answers
23
views
Theoretical Limitations of Achieving 100% Accuracy in Modeling Non-linear Relationships with Neural Networks
I am working on a project where I need to model a specific non-linear relationship using a neural network. The relationship is given by $y = 3x_1^2x_2^3 $. The approach involves:
Preprocessing the ...
6
votes
1
answer
180
views
Changing output size from a model
So I am currently training some deep learning models for some basic classification problems, and I am trying to figure out if it is possible to change the output size of the model in case I want to ...
0
votes
1
answer
30
views
How to explain missing dates to a model?
I have this dataset that I'm trying to train a neural network on.
The problem is that since weekend dates are not available, I am not confident in whether the model is able to account for that. ...
1
vote
1
answer
62
views
Improving GPU Utilization in LLM Inference System
I´m trying to build a distributed LLM inference platform with Huggingface support. The implementation involves utilizing Python for model processing and Java for interfacing with external systems. ...
0
votes
0
answers
61
views
diffusion model: can't overfit on single batch
I am training the diffusion model from diffusion policy, specifically their vision notebook, on a custom dataset. As always, I try to make a sanity check of the pipeline, by overfitting on a single ...
0
votes
1
answer
25
views
Accuracy and test_accuracy gives a result =1
I've developed a code for classifying hyperspectral images using three different convolutional neural network (CNN) architectures: 1D, 2D, and 3D. The code has two main parts:
Preprocessing and data ...
0
votes
1
answer
57
views
Is it possible to train a neural network to feed into a Random Forest Classifier or any other type of classifier like XGBoost or Decision Tree?
I want to create a model architecture to predict future stock price movement as such:
The Goal of this model is to predict if the price will go UP or DOWN within the next 3 months.
I have tried a few ...
0
votes
3
answers
63
views
How do I force my NN to do nothing but memorize?
Consider a neural with N layers of size $M_n$. I want this NN to do nothing but memorize. I want it to fail if it is asked to make a classification for an input it has never seen before, I want it ...
0
votes
1
answer
41
views
How good are LSTMs in generalizing when learning curves?
I'm interested in the following scenario: I want to learn a mapping that maps a function to another function, i.e. I want to approximate a functional operator. If one is unfimiliar with operators one ...
0
votes
1
answer
28
views
CS undergrad query about DS
why is learning DS so ambigious .you dont truly know what should you learn to actually do DS .web dev say has a clear path learn html css js and you can make something .i am a cs undergrad just want ...
1
vote
1
answer
52
views
Is there a model that can predict continuous data while also providing a level of confidence in the prediction?
The problem with Bayesian neural network seems to be that it is primarily working for classification problems. Is it possible to adjust this neural network, or even use a different model if one exists,...
0
votes
0
answers
21
views
Tensorflow optimization help - ANN unable to optimise seemingly simple time series prediction problem
A basic Tensorflow NN model is unable to optimise a simple synthetic time series prediction problem. I have tried various configurations and optimizers, but the model cannot beat a naive "flat&...
0
votes
0
answers
9
views
pytorch is_leaf problem
I have a problem about is_leaf of the rotation_matrix i defined below in picture 1.Picture 2 shows how do i get rotation[i] by using getattr to get it from model_params. Picture 3 shows how do i use ...
0
votes
0
answers
27
views
Converting multiple binomial logits to multinomial
I am faced with a image classification problem with 3 classes. My existing network consists of 3 'branches' each corresponding to one of the classes. Each of these branch outputs a binomial logit ...
0
votes
0
answers
18
views
Recommendation: matrix factorization vs neural network training
In the case of collaborative filtering, say we have a matrix of item-item (could also be user-item) interactions.
In the "matrix factorization" approach, we use algorithms such as SVD or ...
2
votes
1
answer
38
views
Practical Experiments on Self-Attention Mechanisms: QQ^T vs. QK^T
I'm currently exploring the self-attention mechanism used in models like Transformers, and I have a question about the necessity of using a separate key matrix (K) instead of just using the query ...
0
votes
0
answers
11
views
Deep neural network is plateauing on a regression task
I'm training a deep neural network on temporal graph data. Currently, I'm trying to get a feel for how large / complex of a model I should aim for, so I'm trying to overfit to my smallest dataset. ...
0
votes
0
answers
9
views
Positional Encoding for FFNN?
Here is my problem: I have input [x1,..,xt,n1,..,nt,1,2,...,t] where there is a missing timestep xi, and I use neighboring time series (found with KNN) n1,...,nt to add more features, as well as time ...
0
votes
1
answer
42
views
How do I give weight to recent time points when predicting another closeby time point?
I am building a normal feed-forward neural network to predict the value of a masked time point using regression, e.g. I have values for x at times 1, 2, and 4, and I want to predict its value at time ...
0
votes
0
answers
9
views
Overfitting - Imbalance Classification using Deep-feed forward network
I have an unbalanced dataset, so I used SMOTEENN on the training set to resample, after training DFF,i could see the model is overfitting, could someone help me solve this?
Thank You.
...
1
vote
1
answer
34
views
Unordered Set Classification Problem
In my setup I have one feature which is a sparse list representing categories. For example, let's say that we have M categories in the interval ...
2
votes
1
answer
169
views
AutoDiff on different operations?
How it is possible to use automative differentiation (computational graph) on operations like - convolution?
I know that 2d convolution can be represented by matrix multiplication. But what about 3d ...
0
votes
0
answers
9
views
Patterns in weights of trained model?
Apologies for a naive question. Let's say I am training a simple feed-forward neural network using stochastic gradient descent with a fixed architecture, learning rate, number of training epochs, and ...
3
votes
1
answer
808
views
How does a Neural Net handle an unseen class for a Categorical Feature?
Let's say I train a Neural Net, and I have a Categorical Feature X.
During training, there are only 3 classes seen in feature X; A, B, C.
Now, let's say I want to make predictions from this trained ...