官术网_书友最值得收藏!

The probability space and general theory

When probability is discussed, it's often referred to in terms of the probability of a certain event happening. Is it going to rain? Will the price of apples go up or down? In the context of machine learning, probabilities tell us the likelihood of events such as a comment being classified as positive vs. negative, or whether a fraudulent transaction will happen on a credit card. We measure probability by defining what we refer to as the probability space. A probability space is a measure of how and why of the probabilities of certain events. Probability spaces are defined by three characteristics: 

  1. The sample space, which tells us the possible outcomes or a situation 
  2. A defined set of events; such as two fraudulent credit card transactions
  3. The measure of probability of each of these events

While probability spaces are a subject worthy of studying in their own right, for our own understanding, we'll stick to this basic definition. 

In probability theory, the idea of independence is essential. Independence is a state where a random variable does not change based on the value of another random variable. This is an important assumption in deep learning, as non–independent features can often intertwine and affect the predictive power of our models.

In statistical terms, a collection of data about an event is a sample, which is drawn from a theoretical superset of data called a population that represents everything that is known about a grouping or event. For instance, if we were poll people on the street about whether they believe in Political View A or Political View B, we would be generating a random sample from the population, which would be entire population of the city, state, or country where we are polling.

Now let's say we wanted to use this sample to predict the likelihood of a person having one of the two political views, but we mostly polled people who were at an event supporting Political View A. In this case, we may have a biased sample. When sampling, it is important to take a random sample to decrease bias, otherwise any statistical analysis or modeling that we do with sample will be biased as well. 

主站蜘蛛池模板: 蕲春县| 驻马店市| 遵化市| 潞西市| 瓮安县| 英德市| 盐亭县| 封丘县| 宜春市| 惠安县| 霍城县| 宜川县| 陆河县| 高邮市| 额敏县| 集安市| 九龙城区| 鄢陵县| 尖扎县| 本溪市| 雅江县| 文成县| 哈尔滨市| 会昌县| 枞阳县| 诸暨市| 黄浦区| 城市| 合肥市| 尚志市| 哈密市| 南华县| 绥阳县| 花垣县| 朝阳区| 琼海市| 荆州市| 广州市| 育儿| 贵定县| 永和县|