Wednesday, April 15, 2020
Why Test Data Should Not Be Normalized
Why Test Data Should Not Be NormalizedIn Part I of this article series we discussed the various types of standards that are added to a sample and used to take into account matrix eigenvalues. In this part of the article series we will be looking at the various types of matrix eigenvalues. We will also look at the use of matrix eigenvalues in a typical cross validation. Before I begin, let me make clear what matrix eigenvalues are and how they can be used.You might have heard the term eigenvalues before and you might be wondering what they are and how they can be used. Matrix eigenvalues are basically the square roots of a matrix. The square roots of matrices are known as eigenvectors.Most standard formulas for assessing the performance of a model just take the squared values of the eigenvalues of the training data. However, the above formula is not valid for matrix eigenvalues. Even if you assume that the entire training set of data is normally distributed. This is because the eigenv alues are not normally distributed and you need to normalize the eigenvalues.So the standard formula does not apply for matrix eigenvalues, but you might be wondering what happens if you compute them for a single instance test? Well there are actually two ways to do this.The first way is the standard approach where you run the standard test on a single instance. It is very important to understand that a single instance test is not normally distributed. If you calculate the standardized metrics for each of the data points you will get very different results. There is a simple explanation for this.First let us consider the case when you have a data set with a large number of data points, say n points. You want to compute the metrics for all n points at once. However, there is only one data point in the data set. If you compute the standardized metrics for all the data points at once the result will be completely different from the computation of the metrics for all the data points ind ividually.The second approach is to consider the case where you want to compute the metrics for the smallest number of data points. If you try to compute the metrics for all the data points individually then the only thing that is left to consider is the variance of the metrics. Once you have computed the variance of the metrics you will notice that it is the square root of all the factors multiplied together.When the data sets have a large number of data points the data that is collected but not yet analyzed, has eigenvalues that are too small to be seen by the human eye. Therefore, a metric that uses eigenvalues that are too small is inappropriate. The best way to compute the metrics for data points with small eigenvalues is to create a uniform subspace.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.