Saturday, December 31, 2022

Dimension Reduction in Machine Learning. Why PCA?

Background on Dimension Reduction

Dimensionality Reduction in Machine Learning is the process of reducing the number of dimensions in the data by excluding less useful features (Feature Selection) or transforming the data into lower dimensions (Feature Extraction). Putting it simply, Dimension Reduction refers to the process of reducing the number of attributes in a dataset while keeping as much of the variation in the original dataset as possible.

When we reduce the dimensionality of a dataset, we lose some percentage (usually 1%-15% depending on the number of components or features we keep) of the variability in the original data. Though it offers the following advantages.

  • It prevents overfitting. Overfitting is a phenomenon in which the model learns too well from the training dataset and fails to generalize well for unseen real-world data.
  • A lower number of dimensions in data means less training time and fewer computational resources and increases the overall performance of machine learning algorithms
  • Dimensionality reduction is extremely useful for data visualization. Data in 2 or 3 dimensions is easier to visualize.
  • Dimensionality reduction removes noise in the data

 



As mentioned above, Dimension Reduction methods can be classified into two categories.

1.      Feature Selection

a.      Variance Seeking

The variance method of dimension reduction is a technique that aims to reduce the number of dimensions in a dataset by selecting a subset of the most important features that capture the most variance in the data. The goal is to reduce the dimensionality of the data while retaining as much information as possible.

 

There are a number of ways to select the most important features using the variance method. One common approach is to calculate the variance of each feature and select the features with the highest variance. Another approach is to use a feature selection algorithm, such as mutual information or the ANOVA F-test, to identify the most important features.

 

The variance method of dimension reduction is often used in combination with other dimension reduction techniques, such as Principal Component Analysis (PCA) or Independent Component Analysis (ICA), to further reduce the dimensionality of the data.

b.      Backward Elimination

This method eliminates (removes) features from a dataset through a recursive feature elimination (RFE) process. The algorithm first attempts to train the model on the initial set of features in the dataset and calculates the performance of the model (usually, the accuracy score for a classification model and RMSE for a regression model). Then, the algorithm drops one feature (variable) at a time, trains the model on the remaining features, and calculates the performance scores. The algorithm repeats eliminating features until it detects a small (or no) change in the performance score of the model and stops there!

c.      Forward Selection

This method can be considered as the opposite process of backward elimination. Instead of eliminating features recursively, the algorithm attempts to train the model on a single feature in the dataset and calculates the performance of the model (usually, accuracy score for a classification model and RMSE for a regression model). Then, the algorithm adds (selects) one feature (variable) at a time, trains the model on those features, and calculates the performance scores. The algorithm repeats adding features until it detects a small (or no) change in the performance score of the model and stops there!

d.      Important Features from Decision Trees of Random Forest

Random forest is a tree-based model which is widely used for regression and classification tasks on non-linear data. It can also be used for feature selection with its built-in feature_importances_ attribute which calculates feature importance scores for each feature based on the 'gini' criterion (a measure of the quality of a split of internal nodes) while training the model.

 

2.      Feature Extraction


Feature extraction techniques aim to transform the data from a high-dimensional space into a lower-dimensional space while preserving as much information as possible. These techniques can be either linear or non-linear, and they often involve creating new features from the original features using a mathematical transformation. It is best to visualize the data before the activity to observe its shape (linear or non-linear). Though it is important to realize that we can only visualize data in 3 or 4 dimensions. For that, we can reduce the number of dimensions through variance and visualize important dimensions based on any of the above

Linear Algorithms

a.       Principal component analysis (PCA): This method projects the data onto a lower-dimensional space by identifying the directions of maximum variance in the data.
 
b.      Linear discriminant analysis (LDA): This method projects the data onto a lower-dimensional space while maximizing the separation between different classes in the data.
 
c.      Singular value decomposition (SVD): This method decomposes the data matrix into the product of three matrices, which can be used to identify the principal components of the data.

 d.      Independent component analysis (ICA): This method seeks to identify independent latent factors that explain the variance in the data.

Non-Linear Algorithms

e.      Autoencoders: These are neural network architectures that are trained to reconstruct the input data from a lower-dimensional representation, effectively learning a compressed representation of the data.
 
f.       Kernel PCA: This method extends PCA to non-linear data by using a kernel function to map the data into a higher-dimensional space before performing PCA.
 
g.      t-distributed stochastic neighbor embedding (t-SNE): This method projects the data onto a lower-dimensional space while preserving the local structure of the data.

 


These are just a few of the many methods available for dimension reduction in machine learning. The appropriate method to use will depend on the specific characteristics of the data and the goals of the analysis.


 

Why PCA?

With everything said and done, the simple choice of the people aiming for Feature Extraction is Principal Component Analysis (PCA). Why? Because it is simple and runs through without much hyperparameter tuning. One reason for its popularity is that it is relatively simple to implement and understand, as it involves finding the eigenvectors and eigenvalues of the covariance matrix of the data. Another reason is that it has been well-studied and has a solid theoretical foundation.

PCA is also computationally efficient and can handle large datasets, making it suitable for use in many practical applications. In addition, it has been shown to work well on a wide range of data types, including continuous, categorical, and binary data.

Another reason that PCA is popular is that it is easy to interpret the results, as the principal components are ranked by their explained variance. This allows users to easily see which features are most important in explaining the variance in the data.

Overall, the simplicity, efficiency, and versatility of PCA make it a popular choice for dimension-reduction tasks. However, it's important to keep in mind that other dimension-reduction techniques may be more suitable for certain tasks, depending on the characteristics of the data and the requirements of the application. For example, Linear Discriminant Analysis (LDA) is a very solid technique that performs better than PCA in many cases. Both PCA and LDA reduce the number of dimensions in a dataset while retaining as much of the information as possible, though, unlike PCA, the main goal of LDA is to maximize the separation between classes in the data while minimizing the variance within each class. The extra input parameter of the target variable adds weight to the LDA which shows in its improved performance, particularly in the classification-based datasets.

Below are some areas to consider when choosing dimension-reduction techniques.

 

1.    Ensemble dimension reduction: Using feature extraction on top of feature selection, which could further increase the performance of machine learning algorithms.

2.    The type of data: Different dimension reduction techniques are better suited to different types of data. For example, Principal Component Analysis (PCA) is a good choice for continuous data, while Linear Discriminant Analysis (LDA) is better suited for categorical data.

3.  The goal of the analysis: Different dimension reduction techniques have different goals. Some techniques, such as PCA, aim to maximize the variance in the data, while others, such as LDA, aim to maximize the separation between different classes. It's important to choose a technique that aligns with the goals of the analysis.

4.  The number of dimensions: Some dimension reduction techniques are better suited to high-dimensional data, while others are more effective for low-dimensional data. For example, t-SNE is a good choice for visualizing high-dimensional data, while PCA is more suitable for reducing the dimensionality of large datasets.

5.   The complexity of the data: Some dimension reduction techniques, such as PCA, are relatively simple to implement and understand, while others, such as Independent Component Analysis (ICA), are more complex and may require more expertise to use effectively. It's important to choose a technique that is appropriate for the level of complexity of the data.

 

Overall, it's important to carefully consider the characteristics of the data and the goals of the analysis when choosing a dimension reduction technique. It may be necessary to try multiple techniques and compare the results to determine the most suitable technique for the task at hand. 

Saturday, October 24, 2015

Why every other Super Power is offering Pakistan Fighter Jets and what’s in it for Pakistan?

Couple of years back Pakistan was thirstily looking for a possible acquisition of fighter jets and no country besides China was offering them any. Now! US, Russia & China have offered fighter jets to Pakistan, good times to be in the top command of Pakistan Armed Force.

How will the deal go down is a question for the future but things are starting to get interesting. I don’t have all the details but these acquisition are not possible by paying the amount in cash, it is certainly a deal on credit. But the real question is what has prompted these countries to offer one of their leading jets to Pakistan, a country which is at odds with a so called Super Power. 

Let’s first take a look on what China is offering. FC-20 or more popularly known as J10B. Pakistan engineers/pilots were hooked on the jet a long time, the time when J10 was on trial. Pakistanis were so interested that they requested several modification and upgrade to the jet, which later lead to the creation of J10B. Now when the plane is ready Pakistan is having trouble with financing the acquisition, even though the chances of credit terms (they still have to pay back the loan). Then there is also the question of maintaining the plane. A new kind of jet means huge maintenance cost as they have to develop a whole infrastructure for it. News is that one of the trial plane is flying with a Chinese Engine – regular planes have Russian engine which is cause of concern – just in case relationship deteriorates with Russia.

Second in the line of offering is Russian SU35 fighter jet. The fighter has develop it’s identity as one of the most agile modern fighter jets – ideal for dogfights, which Pakistan is famous for. Though there is a School of Thought which believes that the age of dogfights is over – but not Pakistanis. Thanks due to sanction for most of the previous decade Pakistan was denied acquisition of latest fighter jets which have the capability to fire missiles that are beyond visual range (i.e BVR missiles), Pakistan has acquired the art of getting closer to the enemy and then popping up for a dogfight. The SU35 is not just about dogfights, it is one of most potent fighters in the Industry and the most advance product in offering for Pakistan. The reason Russia has offered it to Pakistan can be associated to
  1. India inclination to the United States has prompted Russia to take sides
  2. They are just finding new customer for their product
  3. Pakistan inclusion in SCO. A SCO member should be elevated to a superior standard.

And then there is ever so popular F-16s, which Pakistanis are ever so hungry for, they even jump in dumpster on chance of finding some part of the jet. Pakistanis give preference to the jet mainly because of two reasons. First, they already have a well set infrastructure for the plane, they can maintained the jets even through the sanctions and second that their pilots a very much accustom to the plane, reducing the training time plus they know the plane capabilities to furthest extant. What every fighter they choose, Pakistan would defiantly buy some of these fighters. The reason United States has offered the plane can be associated to
  1. Counter the Russian offering, prompting Pakistan to side with the US.
  2. Depleting Pakistan Air Force budget so they can’t focus developing JF-17 – A strategy the west have used repeatedly all over the world
  3. They have to utilize the grant given to Pakistan inside US only – I hate, when they do that.

With the above offerings Pakistan is really in a fix. Firstly, they have to side with someone. They also want to keep China happy, as they are the only one which would be offering them 5th Generation fighters when they ready. Whatever the outcome be, Long Live Pakistan



The writer is not an expert in the field. Any observation above are personal and should not be taken under consideration for any decision. The information summarized above is gained through news and open forums. 

Sunday, March 29, 2015

Committee or a Team


Having experienced of working with a Multinational and a Government Organization, I am well placed to discuss the difference between a Committee and a Team. The mentality of a committee is to spread responsibility while making a decision, which makes the whole idea to that they are here to save themselves, the decision may carry a qualification of some members that they are not with the decision. While a team may be violent inside, but to the outside the decision that they take are unified, they own the decision whether they like it or not.

While in a committee you may work against the decision, as you want to make sure that your opposition to the decision gets glorified. In a team no matter how much you are against the decision, you would work you ass-off to make the decision a successful one, because if the decision turns bad, the whole team gets burned.

In a committee you are just there to make decisions, while in a team you are made responsible to act together in achieving the objective.
 
There could be hundreds of other point, the above are crux of it all. A committee is just there to divide the concentration of fire, so that a single participant doesn't get a third degree burns.

Monday, March 23, 2015

A successful but hateful Strategy to re-energize Large Pakistani Organizations


Let's be clear on one thing, great leaders don't have time to give a single organization their whole life. They tend to have varied objectives in life, a single organization whether it be PIA, Steel Mill, Banks or even the State Bank of Pakistan has only a small footprint in overall scheme of things. If you are a great leader, you tend to move from places to places making revolutionary changes where ever you go.

At least, this has been the story of Pakistani Organization. A Leader comes in, with great power given to them, tasked to change the whole organizational culture. And there have been more than a couple of successful instances, were the same strategy has worked.

The strategy is simple, it acknowledges that there is no time to improve the current organization so they create a parallel organization within the organization, it has more power, resources and flexibility and it cuts the ties with old organization. The newly hired people in the new organization are comparatively fresh so they can be molded to the modern needs. What we have now are two completely different sets of people, one (comparatively) old aged, slow, bureaucratic, somewhat loyal and the second sets of people are better educated, young, fast paced, without any red tape culture, would work there ass off to complete the project on time and would move to a different organization without a second thought. There is a hiring freeze in the old organization so they don't grow and are slowly washed out, 'golden handshakes' also comes in the picture if they want get rid of them early.

So far the strategy has worked at State Bank, HBL (and other banks), Lucky group, K Electric and some other organizations. The results are somewhat confused but a lot better than what it would have been if nothing was done. Though one thing is for sure, the organizational output increases drastically.

How can you identify whether this strategy has been implemented in an organization or not. You will see on occasion a really frustrated employee who always have bad words for management, who sees the management as someone who are hell-bent on destroying the organization and on other occasions an employee who are specific, young, with decision making ability and people who work late hours.

Whether the strategy is good or not, would have a different answers depending from which angle you see. But one thing is for sure, the performance and quality of output increases twofold.

Saturday, March 21, 2015

How to Improve Pakistani Cricket



Finally, we are out of the World Cup race, a race we weren’t prepared for, a race we have been preparing for the last four years. What went wrong and what were the findings are discussed widely by cricket commentators and lovers alike. Muhammad Yousuf even suggested to fire all the batting line and never ever bring them again in the Pakistani Team. 

We are bursting on those people who have little control over the issue. The players did their best, that’s all they were capable of. The selection committee in the same way, did their best. The cycle won’t improve unless there is a grave effort in revamping the management process of the Pakistan Cricket Board. Pakistani cricketers are ‘thinking’ of playing with a strategy that is two decades old. Times have change and they need to change the way they play their cricket. But then again, a strategy is when someone actually has a plan, according to Amir Sohail, a veteran cricketer Pakistani Cricketers never played with a Strategy. They just focus on some players to perform outstandingly, if that clicks they think their strategy worked. These are all important issues but the main issue is not of operations rather a strategic change is needed. 

Things won’t improve unless politics is tied-up with the cricket. When Zulfiqar Ali Bhutto Nationalized the industries in Pakistan, he was not expecting that the Ministries would be handed over the control of their operations. Like most of the civilized world there are proper systems to govern the nationalized industries. One of the system which is in practice is to have a Board comprising of a mixture of Public and Private representatives from different walks of life, be it Philanthropist, Industry Practitioners, Ministry representatives, Judges, Entrepreneurs and people even from the organization itself, they make all the major decision including who will lead the organization. This system is in practice by many successful government organizations of Pakistan, one of them is the leading educational institution of the country, IBA Karachi. 

People of IBA are not hired from outside the country, they represent ordinary Pakistanis. What is different is the Governance structure. This same kind of system should be made for other organization of Pakistan, PCB should be the first one to have such kind of transition. Hopefully, we won’t see Ijaz Bhatt and Najam Sethi type of people that are just aristocrat, who are hugely flawed in the way they lead their organization.

Thursday, March 19, 2015

Waiting Lines - Fazaia Housing Scheme



Was quite interested when I heard about the PAF Housing complex, as a person who missed out on the Bahria and DHA bonanza, this was the time to cash-in on the project. The whole enthusiasm went in vain, when I saw some 200 people lineup at HBL bank for the Rs.1,000 forms. I don’t have that much time and energy, to stand all day in line for just the form. I initially thought of dropping the idea altogether, but just the next day someone called me for the forms, why me?

Oh Shit! I have a close relative at HBL head office, I am ‘THE’ person to contact for forms. I called up my relative to request for some forms. In return he told me that if I get hold of some, get him a couple, What!. The story was that PAF Fazaia Housing Scheme picked up the unsold forms from HBL as there was some news of mismanagement and even rioting, also that forms were sold in black, fetching 10k-15k per form. PAF intends to sell the forms at their own booths. Next day someone informed me that the booths were torn-off at PAF museum and the PAF staff fled the scene (with the forms L) not before a baton charge to the people queuing for the forms. 

Does the people know anything about the queue management, it’s a whole science. But as before, in Pakistan people feel proud when they see hundreds of people queuing up for something that they have. A very simple approach would have been to upload the form on a website and ask them to deposit the Rs.1,000 fees along with any other amount, when they were depositing the forms. Infact, it would have been further simplified if the amount was also deposited electronically, there are 10s of methods available these days.

For me, this time again I would be let to the helms of brokers who would either sell the file or eventually the plots on a premium. There is one important finding from the project, Karachiites are hungry for good investment and they are not comfortable with financial investments i.e. stocks or bonds. Trust has been shifted to Property Developers.



Sunday, June 2, 2013

Corporate Attire

I may be stepping over some nice toes here but I felt like talking on the issue. Having experienced seven years of Corporate life in Karachi, I have always seen men dressing in an uptight suiting or at least a dress pant and proper tucked in shirt, shorts are not acceptable even on casual fridays. While women can dress as they like, skirts will also do, exposing some extra skin is very acceptable, even preferred by many. While I believe in personal space and freedom, this does not fall in the personal domain, it creates conflict in admirers minds, distracts him or her from the goals of the business and society blames them on what is followed.

This does not stop here, pretty news casters are equally to blame for inciting viewers, yes I notice them, I am a guy. Their job is to deliver news not modeling, we have models & actors for that. Many organizations prefer keeping such beauty in house, as it helps accomplishing stretched goals, especially in the sales function. On the other side many companies are just too scared on touching this sensitive topic, as it  could label them as extremist, fundamentalist or even terrorist, a word coined to undermine others. Many organizations have gone out of their way to chalk down instructions in their manuals that limits individuals freedom in dressing, implementing that policy however is difficult task, you cannot ask a beautiful person wearing a low cut blouse to go cover it up-just too rude!.

The advent of sexual harassment laws have made this issue of an urgent nature, they are pushing victims of sexual harassment to report incidents to a special committee, which may result in prosecution, whether the committee realizes that the victim was also at fault is a question we would like answers for. But the issue is not of harassment, as offenders will be active even if proper dress code is followed. The implementers of the law are asking organization to train their employees in Sexual Harassment laws & how to report incidents, this presents a good opportunity to inform the participants on their own responsibility, on controlling the incidents, to wear decently. While my experience is limited to Karachi, the matter is of globally nature (I am talking about accomplishing corporate goals here). You can find several articles on corporate dress codes in the US, where it seems that they have risen to make clear policy guidelines & recommendations on corporate attire.