官术网_书友最值得收藏!

The probability space and general theory

When probability is discussed, it's often referred to in terms of the probability of a certain event happening. Is it going to rain? Will the price of apples go up or down? In the context of machine learning, probabilities tell us the likelihood of events such as a comment being classified as positive vs. negative, or whether a fraudulent transaction will happen on a credit card. We measure probability by defining what we refer to as the probability space. A probability space is a measure of how and why of the probabilities of certain events. Probability spaces are defined by three characteristics: 

  1. The sample space, which tells us the possible outcomes or a situation 
  2. A defined set of events; such as two fraudulent credit card transactions
  3. The measure of probability of each of these events

While probability spaces are a subject worthy of studying in their own right, for our own understanding, we'll stick to this basic definition. 

In probability theory, the idea of independence is essential. Independence is a state where a random variable does not change based on the value of another random variable. This is an important assumption in deep learning, as non–independent features can often intertwine and affect the predictive power of our models.

In statistical terms, a collection of data about an event is a sample, which is drawn from a theoretical superset of data called a population that represents everything that is known about a grouping or event. For instance, if we were poll people on the street about whether they believe in Political View A or Political View B, we would be generating a random sample from the population, which would be entire population of the city, state, or country where we are polling.

Now let's say we wanted to use this sample to predict the likelihood of a person having one of the two political views, but we mostly polled people who were at an event supporting Political View A. In this case, we may have a biased sample. When sampling, it is important to take a random sample to decrease bias, otherwise any statistical analysis or modeling that we do with sample will be biased as well. 

主站蜘蛛池模板: 宜城市| 苍梧县| 长沙市| 普陀区| 池州市| 太和县| 林甸县| 池州市| 大丰市| 凌海市| 榆社县| 榆社县| 沙田区| 和田市| 赤壁市| 北宁市| 沿河| 五峰| 南漳县| 额尔古纳市| 金乡县| 湘西| 武平县| 平泉县| 林周县| 崇明县| 台中县| 宁德市| 水城县| 赞皇县| 康乐县| 青田县| 申扎县| 唐山市| 克什克腾旗| 昭觉县| 左权县| 辽中县| 县级市| 天祝| 古浪县|