Feature selection
Procedure in machine learning and statistics / From Wikipedia, the free encyclopedia
Dear Wikiwand AI, let's keep it short by simply answering these key questions:
Can you list the top facts and stats about Feature selection?
Summarize this article for a 10 year old
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction.[1]
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (July 2010) |
Feature selection techniques are used for several reasons:
- simplification of models to make them easier to interpret by researchers/users,[2]
- shorter training times,[3]
- to avoid the curse of dimensionality,[4]
- improve data's compatibility with a learning model class,[5]
- encode inherent symmetries present in the input space.[6][7][8][9]
The central premise when using a feature selection technique is that the data contains some features that are either redundant or irrelevant, and can thus be removed without incurring much loss of information.[10] Redundant and irrelevant are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated.[11]
Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points).