Following in the steps of Big Brother? How Brazil’s rightward shift is similar and dissimilar to that of the U.S. in 2016 by Flavio Azevedo

Flávio Azevedo, University of Cologne, Germany,
Daniel Mucciolo, Universidade do Contestado, Brazil,
Da'Quallon D. Smith, Columbia University, USA

Brazil is a country of superlatives. It is the world’s 5th largest and most populous country in the world, extremely rich in natural resources, and Latin America’s most powerful economy. Brazil is also the primary home of the Amazon Forest,the earth's lungs, which absorbs ¼ of the world’s carbon dioxide. In an ever-interconnected global economy and environment, Brazil’s election results matter well beyond its borders. On the last Sunday of October, as the country’s constitution mandates, a run-off election pitted Fernando Haddad, who took over Luiz Inácio Lula da Silva’s Workers Party, against a candidate pundits have been calling the “Trump of the Tropics." But appearances can be deceiving. While parallels can indeed be drawn between Donald Trump and Brazil's President-elect Jair Bolsonaro, differences between the two countries' circumstances and the candidates' ideological characteristics are palpable. 

Think of the disparate backgrounds wherein the two contests took place. Contrary to the U.S. in 2016, Brazil's latest election took place while the country was reeling from the worst economic crisis in its history, with sky-high unemployment and record-shattering murder rates. As if things were not bad enough, Brazil is still embroiled in an ongoing 4-year long corruption investigation that has shaken the nation to its core, incriminating politicians across all parties and ideological proclivities. Dubbed operation car wash, it has said to have uncovered the largest corruption scheme in the world, and one that has sent Brazil's favourite son, Lula, to prison. 

Against the backdrop of a seemingly never-ending political corruption scandal, it is no surprise that only 13% of Brazilians are satisfied with democracy, only 11% think the country is going in the right direction, and the legislative and executive branches are among the country’s most distrusted institutions. In a country in which voting is mandatory, 42.1 million people chose not to select a candidate in the runoff - that is one in every three Brazilian adults. What these numbers show is a generalized disillusionment – or political alienation, if you will – with the political establishment. But perhaps the most telling contextual factor that differentiates Brazil’s shift to the (far)-right from that in other countries is that the Worker’s Party (PT) had won the last 4 Presidential elections. And while PT oversaw the most prosperous times Brazilians have ever seen, recent political, public security, and economic crises dissolved popular support for PT as well as trust in the entirety of Brazil’s political class. Not only will the lower house have 30 different parties (a record), but almost all of the major political parties had their representations in Congress severely reduced. Indeed, the erosion of traditional parties – particularly on the mainstream right – created a vacuum so large that many Brazilians thought of this election as a referendum on the status quo: a choice between “more of the same” and anything else. Enter Jair Bolsonaro, a retired Army captain and longtime congressman from Rio de Janeiro who – prior to Operation Car Wash and Rousseff’s impeachment – received some notoriety for defending what was once seen as inconceivable rhetoric. For example, during Rousseff’s impeachment proceedings, Bolsonaro drew national condemnation for dedicating his in-favor vote to the memory of Colonel Ustra, a convicted human rights violator and torturer of the military dictatorship – and who personally tortured Rousseff in 1970.

As far as the political campaign goes, there are some striking similarities between Trump and his Brazilian counterpart. Both were widely discredited by political and cultural elites; both uttered racist, sexist, and homophobic slurs without serious consequences (electoral or otherwise); they lauded themselves as incorruptible, promised to drain the swamp, invested heavily in social media, and circulated misinformation on too-big-to-notice-until-it's-too-late platforms such as Facebook, Twitter & WhatsApp; and despite calling the news media “fake” – or  perhaps because of it - dominated national media coverage.
In addition, Trump and Bolsonaro share two ideological appendices to their conservative politics: populism and authoritarianism. In 2004, Mudde synthesized the core elements of populist political actors and defined populism as “a thin-centered ideology that considers society to be ultimately separated into two homogeneous and antagonistic camps, ‘the pure people’ versus ‘the corrupt elite’, and which argues that politics should be an expression of the volonté générale (general will) of the people” (Mudde and Kaltwasser, 2017). Similarly, Hawkins (2009) argues that populism is “a Manichean discourse that identifies Good with a unified will of the people and the Evil with a conspiring elite.” Both scholars, and the literature in general, tend to agree that central to populism is (a)  framing of political divisions as the clash of two opposed and monolithic forces, (b) which is characterized by the assignment of a moral dimension via the juxtaposition of totalizing qualifiers such as good vs. evilus vs. thempure vs. corrupt (van Hauwaert, Schimpf, Azevedo, 2018). It is this moralization and covert in-grouping and out-grouping that predisposes and links populist views to both authoritarian and conservative views. Moreover, it is why, in practice, this partnership take hold almost exclusively on the social and cultural dimension of political orientation. In this sense, populist authoritarianism goes beyond the belief in an ordered society wherein transgressions should be punished severely – also known as law and order, a longstanding staple of authoritarian conservatism – to also include the identification, derogation, and targeting of “deviants.” Indeed, it has been shown that belligerence towards minority groups and the endorsement of the establishing and maintaining group-based hierarchies is a stable and robust predictor differentiating preference for mainstream conservative vs. populist authoritarian candidates (Womick, Rothmund, Azevedo, King, & Jost, 2018). Populist authoritarians seek to perpetuate societal inequalities – if not expand them – which they see as natural and legitimate (Azevedo, Jost, & Rothmund, 2017; Mudde, 2007; p. 23).

When fused, authoritarian populism is seen as a social pathology imbued by a paranoid style of politics which ultimately threatens liberal democratic values. The argument is that democratic rule is built upon the integration of pluralism in the political system, which is institutionalized by fair and free elections, separation of powers, the rule of law, and the equal protection rights and liberties for all people. Indeed, democracies’ checks and balances exist to limit the power of the executive branch and protect citizens from abuse. Populists, however, while claiming to speak for the people, conceive democratic procedure and its institutions as unnecessary obstacles to defending the Nation, as an impediment to their conception of popular will (Müller, 2017). Often through the exploitation of economic grievances, populists advocate for a return to nationalism, encourage prejudice and foment distrust toward globalization, international alliances and trade pacts. When in power, populists tend to reject pluralism and minority rights, clash against the free media, and decrease the extent to which civil liberties and political rights are upheld. Unsurprisingly, for most of the last decade, intellectuals, news media, and politicians have echoed voices of concern against the rise of populism, which is now a major player in politics around the globe. 

However, while Bolsonaro and Trump share a populist authoritarian ideology, Bolsonaro’s rhetoric differs from Trump’s in at least two important ways: ambivalence towards democracy and overt, unapologetic generalized prejudice. We focus on the former first. Contrary to mainstream conservatives, who operate within the boundaries of democratic institutions, the members of the far-right display a varying degree of undemocratic proclivities (Golder, 2016). In a nutshell, the far-right is composed of two groups: the populist radical right and the extreme right. While the populist radical right is critical of democratic institutions – particularly those designed to preclude unchecked majority rule and ensure separation of powers – it is still supportive of elections and democratic rule. The discourse of members of the extreme right, on the other hand, not only shows contempt for democracy and its institutions but often encourages the transfer of governing power and legitimacy away from the Nation’s people. In a 1999 televised interview, Bolsonaro affirmed his support for military intervention, closing the Congress, and said these words about democracy: “You will never change anything in this country through voting. Nothing. Absolutely nothing. Unfortunately, things will only change when a civil war kicks off and we do the work the [military] regime did not. Killing some 30.000, killing them! If a couple of innocents die, that’s OK.” Bolsonaro is also a staunch defender of the murderous legacy of Brazil’s dictatorship. In 2016, while being interviewed on radio, Bolsonaro said this about the practice of torturing captured dissidents: “the mistake was not to torture, it was not to kill them.” Even during the Presidential campaign in 2018, Bolsonaro suggested to a crowd of supporters that they would shoot down PT supporters and send them to Venezuela where they would be forced to eat grass. So, in comparing Trump’s with Bolsonaro’s rhetoric, there is a qualitative difference in support for democracy. Even if the American President tries – and sometimes succeeds – to blur the lines separating the three branches of government, Trump never publicly suggested that autocratic forms of government were preferable to democracy. 

The second difference relates to the presence of covert vs. overt unapologetic prejudice. In all likelihood, Bolsonaro and Trump share the same prejudices (and levels thereof), but when it comes to public political discourse and decorum, there is a qualitative difference. In 2011, Bolsonaro outright stated that his children would never have relations with a person of color because they had been well educated and doing so would constitute promiscuity. He is also on record saying that he would be incapable of loving a homosexual son, and would prefer his death than have him “show up with some bloke with a mustache.” In 2017, Bolsonaro referred to the gender of his daughter – after four male sons – as “a moment of weakness,” not only implying his masculinity or effort affected the sex of his children, but also passing judgment on which sex is superior. We could go on. The take-home message is Bolsonaro’s rhetoric bears the hallmarks of the extreme right and thus conflating it with Trump’s is a grave solecism. 

But why should we care about the electoral consequences of a country thousands of miles away? 
First, Bolsonaro is against environmental regulations and plans to merge the ministries of Agriculture and Environment, in support of agroindustry, which effectively means the invasion of Indigenous people’s lands and unrestrained deforestation of the Amazon. Fewer trees will contribute to global warming, which affects us all. Second, Brazil is a regional geopolitical leader integrating all of its South American neighbors physically, economically and politically. Its stability plays a strategic role in ensuring local shocks don’t travel across the region. Recently, Brazil’s democratic institutions have shown uncanny resilience in the face of three concomitant crises. The military never intervened, court decisions were respected (despite popular upheaval) and constitutional processes were followed. However, as Bolsonaro has promised to crack down on dissidents, the media, and even the electoral court Brazil’s democracy could fall and cause a domino effect across the entire region. Third, Bolsonaro’s unapologetic prejudice against women, homosexuals, blacks and natives, promises to bolster fringe and extreme groups, increase domestic violence and hate crimes – just like it has in the U.S. as Trump repeatedly fails to denounce far-right groups – leading to the death of innocent human beings and human rights violations. Fourth, Bolsonaro has promised to embolden police officers and promote shot-to-kill public policies. In a country with already staggering amounts of police violence and extra-judicial killings, the institutional backing will only increase police impunity and violence – particularly against the poor and racial minorities. Innocent people will suffer for no other reason than authoritarian-fueled ideology. And the conditions of the incarcerated – Brazil has the third largest prison population in the world, who already live in subhuman conditions – have been predicted to substantially deteriorate. Fifth, and least important of all, we may be witnessing the death of conservatism as we know it. Despite a few overlaps with populist and authoritarian views, and the ease with which they constructed alliances in the northern and southern hemisphere, these differ considerably in terms of aspirations and modus operandi. Populist Authoritarianism is brash and passionate while conservatism is modest and cautious. Conservatives tend to respect hierarchy, favor continuity and revere traditional values (Freeden, 1996) while authoritarian populists embody anti-elitism, exacerbation of societal differences, and unmitigated prejudice. Yet, that conservatism’s worldwide drifts into populist authoritarianism does not seem to set off alarm bells. Indeed, conservatives seem oblivious to realize their cultured and traditional precepts have been hijacked before their eyes.

Published at the ISPP Blog & Newsletter.


Azevedo, F., Jost, J. T., & Rothmund, T. (2017). “Making America great again”: System justification in the US presidential election of 2016. Translational Issues in Psychological Science, 3(3), 231.
Freeden, M. (1996). Ideologies and political theory: A conceptual approach. Oxford University Press on Demand.
Golder, M. (2016). Far right parties in Europe. Annual Review of Political Science, 19, 477-497.
Hawkins, K. A. (2009). Is Chávez populist? Measuring populist discourse in comparative perspective.Comparative Political Studies, 42(8), 1040-1067.
Levitsky, S., & Ziblatt, D. (2018). How democracies die. Crown.
Mudde, Cas (2007). Populist Radical Right Parties in Europe. Cambridge University Press.
Mudde, C., & Kaltwasser, C. R. (2017). Populism: A very short introduction. Oxford University Press.
Müller, J. W. (2017). What is populism?. Penguin UK.
Van Hauwaert, S., Schimpf, C. H., & Azevedo, F. The individual level measurement of populism in Europe and the Americas: Insights from IRT as a scale development technique. The Ideational Approach to Populism: Theory, Method & Analysis. Routledge. 
Womick, J., Rothmund, T., Azevedo, F., King, L. A., & Jost, J. T. (2018). Group-Based Dominance and Authoritarian Aggression Predict Support for Donald Trump in the 2016 US Presidential Election. Social Psychological and Personality Science, 1948550618778290.

Useful Resources for Social Science Research by Flavio Azevedo


The list of resources below is intended for (my) personal use. It contains primers, tutorials, guides I have used/encountered and/or want to learn more about it. These are related to producing better science, open-science, methods, statistics, visualization, R & RStudio.






Statistical practice

Study Design

degrees of freedom

Power Analysis

Multilevel design

Robust methods


Linear & Generalized Linear Models

Time Series

Causal Inference in graphs with Animated (GIF) Plots





Scientific Debates

Bayesians vs. Frequentists [in social media]




Reproducible Research



r Resources



R-Series: Complete COurses

Data Science Methods for Psychology - University of Oregon (HardSci)

Data Science for Social Sciences

Nifty Tricks & Tips

Data Visualization



Publication geared

  • ggally: the ally of ggplot2 (display of multiple regression coefficients and its diagnostics, networks, time-series, distributions, etc.).

  • ggstatsplot: Plots with Statistical Details (most common types of statistical tests (parametric, non-parametric, and robust versions of t-tets, anova, correlation, and contingency tables)

  • ggpubrggplot2 based publication ready plots [tutorial at sthda]

  • cowplot: publication-ready theme for ggplot2 (e.g., easy add panels A, B, C). Here's code for adding panels to non-ggplot2 R-base graphs.


Colors and Pallets 


Per type of data


  • ggpairs: Visualizing distributions with groupings [examples: 1]

Group differences

Time Series






[Scholarly]Social Media



Philosophy of Science

Primer on Philosophy of Science

On Karl Popper


  • Bias - list thereof with examples.


Everything else

Economics (Behavioral)



R Journalism - Reproducible Journalism by Washington Post Investigative Data Reporter

Political Science

P-values in science by Flavio Azevedo

what is it? 

P-values is short for probability values. But probability of what? P-value is the probability of obtaining an effect that is at least as extreme as the one you found in your sample - assuming the null hypothesis is true.

No, seriously, what is it?

If the definition was an accessible and intelligible explanation, statistics would not be considered a difficult topic, nor would the meaning/usefulness of p-values be a contentious issue in the scientific community.[1] While there many (many!) attempts to address this issue, what better way to start a blog in methods and statistics than by taking a shot at one of the crux of science-making? In any case, instead of writing about definitions, perhaps it is useful to illustrate the process in which p-values are important so we are better placed to understand what they are, and most importantly, what they are not.

The probability of event x is the relative frequency of x, divided by the sum of the frequencies of all possible outcomes  Nt,  as the number of trials go to infinity. 

The probability of event x is the relative frequency of x, divided by the sum of the frequencies of all possible outcomes Nt, as the number of trials go to infinity. 

P-values are meaningful under the frequentist approach to probability (which is just one perspective under the larger umbrella of probability theory, e.g., Bayesian and Likelihood approaches). Simply put, frequentists view the probability P of an uncertain event x as the frequency of that event based on previous observations. For example, in a set of random experiments wherein the only possible outcomes are x, y, and z, then the frequency of occurrence of x, based on previous observations (frequency of occurrence of xy, and z) is a measure of the probability of the event x. If you run the experiment ad infinitum, that is. The rationale behind frequentist thinking is that as the number of trials approaches infinity, the relative frequency will converge to exactly the true probability.


Let's proceed with an illustration. Say you are a researcher interested in the birth rate of baby girls. And you would like to know whether there are more baby girls being born than baby boys. Here, there are only two possible outcomes, to observe or not to observe the event of interest (i.e., baby girls being born). To investigate that research question, you start by logging the gender of every born baby in the nearest hospital for a full 24 hours. Then, as shown above, you estimate the probability of your event of interest, P(Girls), which is the ratio between the frequency of baby girls born, divided by the total number of observed births (Boys and Girls). You look at your records and see that you observed 88 births in total, 40 baby boys, and 48 baby girls. Then, the estimated probability of a baby girl being born, according to your data, is P(Girls) = 0.5455. These results could be interesting to policy makers, practitioners, and scholars because your data seems to suggest there are more baby girls being born than baby boys. 

But before we run to the world and tell it about this puzzling truth, perhaps we should consider the role of different philosophical approaches the scientific method and how these translate into two forms in dealing with uncertainty and its statistical operationalization.

The role of Uncertainty

Now, you have to find a way to show the policy makers and the scientific community that your finding is "true". But in actuality, you can only show that your results are somewhat 'likely' to be true. Indeed, it is on estimating how probable these results reflect the true probability that the scientific method and statistics intermingle. To start, note that you observed only 88 trials (births), not a infinite number of trials, or a large-enough sample. This means that, statistically (from a frequentist perspective) your estimate did not have the necessary number of trials to converge to the true probability. Another way to look at this is to think about the lack of precision of your estimate. For example, you did not survey the whole population of hospitals and babies being born therein, but just a sample of it. In fact, even more restricted, your surveyed only 24 hours worth of data out of 2,080 hours in say a year, in only one hospital. Given that the true probability is unknown, the limitations of your study could influence the accuracy of your estimate and bias it away from the true population parameter. So the best you can do is to estimate it and assess the degree to which it is an accurate estimate.

The role of Probability distributions

So how do you quantify the strength of your findings? Scientists often resort to a specific method known as statistical inference, which is based on the idea that it is possible - with a degree of confidence - to generalize results from a sample to the population. To give you a more exact explanation, to infer statistically is to abide by a process of deducing properties of an underlying probability distribution via analysis of data. Note the term underlying probability distribution. Researchers relying on empirical and quantitative methods rely heavily on the assumption that the studied phenomenon follows the same pattern as a known probability distribution, whose characteristics and properties are known. Hence, these are compared - the theoretical vs. the empirical distributions - as to make inferences. In your case, you are studying the relative frequency of baby girls versus baby boys. You do your googling and find out that since your variable of interest is a dichotomous and largely randomly determined process (i.e., meaning that there are only two possible outcomes at each event/trial and we assume this to be random), then human births could be understood in Statistics as ensuing from a Binomial distribution. Another example of the same stochastic process is a coin toss (e.g., heads vs. tails), which is known as a Bernoulli trial). In any case, random and dichotomous outcomes, in the long run, tend to follow a Binomial distribution. Now that the probability distribution you assume to underlie human births has been identified, you can use it to compare it to your data.

The role of Hypothesis testing

If you subscribe to the scientific method, in order to perform this comparison, you need a hypothesis, a testable and falsifiable hypothesis. In your case, you would like to test whether you have enough evidence to say 'there are more baby girls being born than baby boys'. Note that a proportion of 0.5 indicates that there are as many baby girls being born as baby boys. That being said, a hypothesis could be "the proportion of baby girls being born is larger than 0.5", or equivalently, you could have said "the probability of baby girls being born is larger than 0.5." Either way, you seem to want to compare the estimate you drew from your data P(Girls) = .5454... to the estimated presumed to be of the population: P(Girls) = P(Boys) = 0.5. This is testable and falsifiable because we presume to know the properties of the stochastic process underlying our data.[2]

The role of confidence LEVEL

The last thing you need before you can assess the degree to which your estimate is likely is to set your preferred the level of confidence. One important reason why you need a degree of confidence is due to sample variation which affects your population estimate P(Girls). Suppose you learned the hospital wherein you collected data found your idea very interesting and decided continue logging the gender of every birth for the next 99 days. After this period, you are shown the results of the Hospital's research below. The picture shows the fluctuation of daily estimates of the probability of baby girls being born. On the y-axis, the number of times a given ratio was found is displayed. On the x-axis, the relative frequencies (or ratios or probabilities) are represented. As before, each count per bar represents an individual day's worth of collected data. For example, the first dark blue bar (on the extreme left), means that there was one day in which the proportion of baby girls and baby boys was estimated to be 0.43. In that day, there were more boys than girls born. The last bar (on the extreme right) was a day in which the proportion of babies girls exceed that of baby boys from 1 to 2. What we learn with this graph is that, had you chosen another day to conduct your original survey, you would have likely found a different estimate of population parameter of interest: P(Girls). Due to this fluctuation, it is good scientific practice to provide a confidence interval to your population estimate. This confidence interval, which is calculated from the sample, asserts that you could be X% sure that the true probability of baby girls being born, or the true population parameter P(Girls), would be contained in the estimated intervals.

The red-dashed line depicts the mean ratio, 0.55, which means that it was estimated that, in the course of 100 days, on average, for every 100 babies (irrespective of gender), 55 are baby girls.

Confidence LevelZLower BoundUpper BoundRange

Obviously, you want to be as confident as possible, right? Yes! But when it comes to choosing a confidence level for your estimate, there are advantages and drawbacks at every level of confidence. In general, the more confident you want to be, the larger the confidence interval. And vice-versa. Talk about being caught between a rock and a hard place.

The table on the right shows exactly this for your P(Girls) estimate. If you choose a confidence level of 90%, this means that 90% (9 our of 10) of the times you run a random experiment, the true probability of baby girls being born would be contained in the estimated intervals. Instead, if you choose a confidence level of 95%, this means that 19 times out of 20 you run your random experiment (95%), the true estimate of P(Girls) will be contained in the confidence interval. And so on.

Formula for confidence intervals of proportions, in the long run.

Formula for confidence intervals of proportions, in the long run.

But how does one calculate the confidence intervals? Again, some googling or textbooks may be necessary. But since you are in a hurry, having lost so much time reading this post, you decide to go for the fastest option: Wikipedia. There, at the page of the Binomial distribution, you learned that the approximate [3] confidence interval for proportion estimates is a function of the estimated probability (p hat), the number of trials (n), and z. Two aspects are clear from the formula, first is that the higher the n, the smaller the confidence interval (give n is in the denominator). The second is that z appears to be a multiplier, which is to say, the larger it is, the larger the confidence interval.

Z is a value from the standard Normal distribution (mean=0, standard deviation = 1), for the wished confidence level (e.g., 90%, 95% or 99%). In fact, the confidence level is the area of the standard normal delimited by z values. For example, for standard normal distribution, the area comprised between -1.96 and 1.96 is equal to 95% of the total area under the curve of the normal distribution. Analogously, area comprised between -2.576 and 2.576 is equal to 99% of the total area of the distribution. If you are in doubt as to why we use the normal distribution as to calculate a Z score for the construction of a confidence interval for outcomes ensuing from binomial distribution, it is because at large enough N (in the long run), and p hat (estimate of proportion) not too close to zero or one, then the distribution of p hat converges to the normal distribution. These confidence levels are intrinsically related to p-values.[5]

Are we there yet?

Yes! Let put your hypothesis to a test. As mentioned above, your hypothesis is that the ratio of baby girls being born is larger than the 0.5. So, one way to think about out testing your hypothesis is to pitch it against the confidence interval you estimated. The other way is explained below in detail. You are saying the ratio of baby girls is larger than .50, correct? The implication being 0.50 is not one of the values comprising you confidence intervals - in the long run. If the 0.50 is contained in confidence interval, it would mean the true probability of baby girls being born could very well be 0.5 at a given level of confidence (90%, 95%, 99%, etc). Ergo, the data collected would not support your hypothesis that there are more baby girls being born than baby boys. Let's check back at the table above, where we estimated four confidence intervals, one for each confidence level. At every considered confidence level, the confidence intervals include 0.50. That is to say while your estimate - or sample parameter - is 0.5454, we cannot discard the the possibility that the true probability of baby girls being born might be 0.50. Note that we still don't know what the 'truth' is. And if we want to get closer to it, we should continue to repeat the experiment. 

However, there is another frequentist interpretation wherein the p-value - and not so much an estimate with a confidence interval - is the star of the show.

The role of philosophical approaches to statistics

So far we have interpreted Frequentist inference as a method with which one can achieve a point estimate (e.g., probability, mean differences, counts, etc.) with an accompanying measure of uncertainty (confidence interval based on the confidence level). And this is a proper way to think of statistical inference within the Frequentist paradigm. However, this approach lacks in terms of practicality to those researchers seeking *an objective answer*, a decision for a given problem. Think of a pharmaceutical company testing whether a drug helps decrease the mortality of a new disease, or, a materials company testing the grade of concrete and steel for constructions in Seismic zones. In these circumstances, null hypothesis significance testing (NHST), is a particularly useful method for statistical inference.

On the philosophical sphere, one important difference between the two frequentist camps (probabilistic vs. hypothesis testing frequentists) ensues from the different answers given to the question where should Statistcs or statistical inference lead to? Should Statistics lead researchers to a conclusion/decision (even if based on probabilities) or should Statistics lead to an estimation/probability statement with an associated confidence interval? Above, we explored the basis of the latter approach. Now we will focus on the former.

The null hypothesis significance testing (NHST)

NHST is the amalgamation of the work of Fischer and Neyman-Pearson. And while these viewpoints differ - by quite a bit - they are unified in providing a framework for hypothesis testing.[4] The underlying reason for 'testing' is that scientists want to generalize the results ensuing from a study with a sample to the population in a way that yields a yes/no decision-making process. As seen above, however, this approach may lead to the observation of biased estimates due to chance alone but also a variety of other factors. NHST is a method designed to quantify the validity of a given generalization. from sample to population. In other words, to infer statistically will always involve probabilities - not surety - and thus, error is inevitable

Due to this, a framework was developed to assess the extent to which a given decision abides by probabilistic principles and minimizes error. In the standard approach, the relation between two variables are compared (say X and Y). A hypothesis versing about the relation of the variables is proposed (say X > Y) and it is compared to an alternative proposing no relationship between the two variables (implying X = Y). This no-relation hypothesis is called Null Hypothesis, or Ho, and it is based on it being true or false that scientists consider the likelihood their own hypothesis being true. The reasoning behind relying on the null hypothesis is best described by one of the most important 20th century philosopher's of Science, Karl Popper who said all swans are white cannot be proved true by any number of observations of white swans as we may have failed to spot a black swan somewhere. However, it can be shown to be false by a single sighting of a black swan. Scientific theories of this universal form, therefore, can never be conclusively verified, though it may be possible to falsify them. Point being, it is easier to disprove a hypothesis because it is impossible to test every possible outcome. So, instead, Science advances only through disproof. So, given that taking decisions based on probabilities will always give rise to the possibility of an error, when comparing two hypothesis, four outcomes should be considered based on whether the null hypothesis is true or false, and on whether the decision to reject (or fail to) was correct or wrong. 

A similar thinking explains the legal principle presumption of innocence, in which you are innocent until proven guilty. Lets think of possible scenarios in a criminal trial. If one is innocent, and the court/jury acquits you, then the decision and the truth match. This is the correct inference and it is called True Negative. "True" refers to the match between decision and the truth, while "negative" has to do with a failed rejection of the null. Similarly, if one is indeed guilty and the court/jury decides for a conviction, the inference is again correct and it is termed True Positive. Where "positive" stands for rejecting the null. Statistics, and NHST, mostly concerns itself with the remaining options. When you are convicted, but didn't commit the crime (False Positive), and when you are acquitted, but did commit a crime (False Negative). These are termed Error Type I and II, respectively, and they are key concepts introduced by Neyman and Person to the Fischerian perspective. 

Null Hypothesis Significance Testing outcomes [or Table of error types]

The type I error rate is also known as the significance level, or α level. It is the probability of rejecting the null hypothesis given that it is true. The α value is analogous to the confidence level explained above via the relation CL = 1 - α. Meaning that if the researcher want to adopt a confidence level of 95%, then she or he is automatically adopting an α level of 5%. As for the testing part, the way hypothesis testing is done is relatively simple: you have to assess the probability of the sample data as a realization of the null hypothesis. In other words, one compares probabilistically two sets of data: one ensuing from the null hypothesis and the other from the alternative - both assuming the phenomenon follows a given stochastic process (i.e., data points are observed according to a known distribution). This is done by calculating the probability the sample data has been generated by the null hypothesis. Then, given a significance level, a comparison is deemed 'statistically significant' if the relationship between the sets of data an *unlikely* realization of the null hypothesis. How unlikely depends on the chosen significance level. Makes sense?

Let's take your research example one last time. Since we have a binary variable, it is easy to demonstrate the calculation of the p-value. Your sample data consists of 88 observations where 48 baby girls and 40 baby boys were born. The estimated probability of P(Girls) = 0.5455. You are interested in showing that there are more baby girls being born than baby boys. When thinking about testable hypotheses, the null-hypothesis would be a null difference between the proportions of baby girls and boys. This is mathematically represented by P(Girls) = 0.50. Then, the alternative would be P(Girls) > 0.50. On the philosophical sphere, we learned that you cannot show that P(Girls) > 0.50, but you could show that P(Girls) = 0.50 is so *unlikely*, given your data, that you would consider P(Girls) > 0.50 as being the more likely scenario, until there is further evidence. This is where the key to understanding inferences lies. And this is what p-values do, it informs you about the probability - or how likely or unlikely - your sample data is a realization of the null hypothesis, based on the chosen significance level. So, a null hypothesis is rejected if the relationship between the data sets would be an unlikely realization of the null hypothesis. This is why the definition of p-value contains the "assuming the null-hypothesis is true." It is because the p-value is the probability of the sample data being an outcome of the data generated by the null.

Bare with me through the calculations. What is the probability of observing 48 baby girls in 88 trials, assuming the true probability is 0.5? To calculate this, we need to know two things. First, how many ways one can observe 88 births yielding a total of 48 baby girls? The answer stems from the combination of 88 Bernoulli trials whose sum total 48. There are 1.831258e25 ways as shown below. Second, we need to know what is the probability that we will observe exactly 48 baby girls AND 40 baby boys, in 88 births. P(48 Gilrs) x P(40 Boys) is very low, at 3.231174e-27. We multiply these two probabilities to find the probability of 48 baby girls, in 88 births, assuming the probability is 0.5, which is almost 0.06.

That is to say, if you collect data again, assuming P(Girls) = P(Boys) = 0.50, which is the null hypothesis, there is a 6% chance that I will observe a ratio of 0.5455. By now, I hope we understand the meaning of "assuming the null hypothesis is true" part of the p-value definition. It is because we calculate the likelihood of a given configuration of results while using the parameters set forth by the null hypothesis. That said, we can now break down the meaning of "at least as extreme" part of the definition.

P-values give you the cumulative probability, rather than just a probability. That is to say, p-values assess the probability of observing a given value (in your case, 48 out of 88) and all other possible values that are more extreme. In your case, the p-value gives you the probability to observe 48 or more baby girls being born so we need to calculate the probability of observing 49, 50, 51 ... 86, 87, 88 baby girls, in 88 trials, assuming that P(Girls) = P(Boys) = 0.50. Just as an illustration we also perform the calculations for 49 births. 

As you can see from the table below, the probabilities will decrease, as it becomes less and less likely that you would observe an increasingly larger proportion of baby girls being born, if 0.50 is the true probability. So the cumulative probability of observing 48 births, or more, assuming P(Girls) = P(Boys) = 0.50, is 0.223. This is the big moment you were waiting for. 0.223 is the p-value when testing whether he can reject the null hypothesis (P(Girls) = 0.50) in favor of the alternative (P(Girls) > 0.50). It indicates that it is quite likely (almost 1/4 of the times) to observe a set of data showing 48 births or more assuming P(Girls) is indeed 0.50. As for the 'testing part', it require nothing more than comparing the obtained p-value with the criterion for significance. Since most scientists use an α level of 0.05, and the p-value of 0.223 is larger than the α level, we do not have enough evidence to reject Ho. So, pending future studies, we can only consider that the probabilities of baby girls and boys being born are - roughly - the same.

Other researchers, however, could criticize that your alternative hypothesis is tendentious. In the sense that, with one day's worth data, you should perhaps consider a broadened alternative hypothesis. One which didn't provide a direction, one which didn't consider only half of the possible outcomes. Indeed, when setting the alternative as P(Girls) > 0.50, you are not considering P(Girls) < 0.50. And while your estimate indicates a larger proportion of baby girls, you shouldn't assume that your sample's suggestion indicates the truth. So, perhaps more appropriately, you would like to also consider P(Girls) < 0.50. In that case, we can say that your null hypothesis is P(Girls) = 0.50, and your alternative is P(Girls) ≠ 0.50, which covers both directions. In that case, what do we need to do to calculate the p-value for the non-directional hypothesis? We can either repeat the process, for 40, 39, 38... 1 baby girls being born, or multiply the calculated p-value by 2, both yield the same result. Note that you should start with 40, not 48 or 44. This is because you observed a difference of 4 more than your expectations which is 44, if P(Girls) = 0.50. Thus, you calculate the cumulative probability of observing the same difference of 4 from the expectation, but in the other direction: 44 - 4 = 40. The p-value is 0.223*2 = 0.446. The increase in p-values shows that by considering both directions, it is even more common (twice as common) to observe a difference of 4 births from the expected value of 44, assuming the probability P(Girls) = 0.50, in 88 trials.

The role of Power

Now you say "OK, great. But I was confident there were more baby girls than baby boys being born. Before I am completely convinced I was wrong, is there perhaps something I may have missed precluding me from arriving at the correct conclusion?" And despite the fact that the answer to this sort of question in Science is always "yes" (i.e., in the best case scenario, there are always improvements to be made), there is one key aspect of hypothesis testing we have not yet addressed: Type II error. This occurs when there is in fact a true difference (in your case, in the proportion of ratios) but hypothesis testing fails to reject the null hypothesis. This is termed a false negative. It is when one is guilty of a crime, but the court/jury acquits the defendant. In many ways, Type II error is the other side of Type I error, where you false conclude there is a difference, when it is not (i.e., false positive). Both are said to be "False" because it is the wrong decision. Thus, ideally, we want to avoid both, and always find effects when they exist, and fail to find them when they don't. Curiously, Type II error rate is deemed the lesser evil (in comparison to Type I error) by the scientific community. Perhaps because the harm done when incurring in this type of error is the maintenance of the status quo. Personally, I have my doubts about this, and the reproducubility crises in science can better showcase this point. 

where n is sample size, p0 is the hypothesized population value,&nbsp;Φ&nbsp;is the standard Normal distribution function, and α&nbsp;is significance level

where n is sample size, p0 is the hypothesized population value, Φ is the standard Normal distribution function, and α is significance level

But more to the point, Type II error, or β, depends on three components, the magnitude of effect, the sample size (or sampling error) and the statistical significance used. The rationale is the following. Assuming a constant significant level, if the magnitude of effects are small, a larger sample size is necessary detect "a signal" (or reject the null). If the magnitude of effects are large, then sample sizes can be smaller and still not incur in Type II error. But if you are somewhat acquainted the scientific method or hypothesis testing, β is hardly ever mentioned. Instead, researchers tend to speak of 1 - β, which is known as power. As in, power to detect signal. Power is the probability that the test correctly rejects the null hypothesis when the alternative hypothesis is true. Importantly, power analysis can be used not only to calculate β (probability of false negative), power (1 - β), but also the minimum sample size n required so that one can be reasonably likely to detect an effect. So, you can use this knowledge to better understand your study's results. By using the formulas to the top-right corner, you would find that the power (1 - β) of your study is 0.214 if you were only interested in showing P(Girls) > P(Boys) - or 0.137 for P(Girls) ≠ P(Boys). This means, at best, you had a 0.214 probability to reject the null when it is true. By contrast, the false negative rate (type II error) is 86%. From these calculations, it becomes clear the sample size (N=88) is too small to identify such a small difference in proportions - from P(Girls) = 0.50 to P(Girls) = 0.5455, whose nominal effect size is 0.091. Indeed, by using the last formula above to calculate the minimal sample size necessary to detect a significant difference with the surveyed effect magnitude. Results show a whooping 948 births as a necessary to be 80% sure that you are able to detect a meaningful difference, when there is one. 80% is usually the power sought after in academia, but different fields use different power ratios (think of medicine, where doctors may prefer a false positive and require additional confirmatory tests, than have a false negative, and send a sick patient home). If you would like to be 90% sure, your study would need 1261 observed births, 1560 for a 95% true positive rate, and 2205 for 99%. In the long run, that is. Also bear in mind that these numbers related to testing non-directional hypothesis, that is, you are considering both  P(Girls) > P(Boys) and  P(Girls) < P(Boys). The required sample size would be slightly smaller if you only consider P(Girls).

To think about it in another way, imagine 99 other independent researchers were to replicate your study with the same protocol while not polling together their data. And lets assume you were right, and that the true probability of girls being born P(Girls) is 0.55. Then, by cataloging all births for each day, we can plot a summary of these 100 studies as displayed on the right. Each point represents the found ratio/proportion for one specific day while the line represents its confidence interval. Colored in red are the days in which the researcher would find that P(Girls) > P(Boys), and in black that we do not enough evidence to reject P(Girls) = P(Boys). There are 17 instances in which the confidence interval does not include 0.50, and 83 instances that it does. Note that these numbers are slightly different than those reported above by power and Type II error rates. This is because these always refer to "in the long run". That is, if these 100 replications were to be repeated in time, then on average, researchers would identify a significant difference (p-value lower than 0.05 or confidence interval not including the 0.5 in about 14% of the time). And yes, p-values and confidence intervals are equivalent at the same significant level. I personally prefer the former because it has one important advantage: in comparison to p-values, confidence intervals reflect the results at the level of data measurement. Another interesting implication of the above plot is that if you were to rely on one day's worth of data, most days you would take the 'wrong' conclusion. For this reason, power analysis in an integral part of study design. Ideally, these analyses should be done prior to data collection. [6]

To sum up

P-values only make sense under the umbrella of frequentist statistics. This sub-field of statistics has two main camps, one which interpret that statistics should be about probabilistic estimates accompanied by confidence intervals, and another which recognizes the importance and/or necessity to provide objective decisions based on data. NHST is s used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. In this process, p-values play an important role, however, it is also fundamental to conduct a priori power analysis as to insure the study is properly designed for the investigation at hand. Science progresses through disproof. Einstein said ‘A thousand scientists can’t prove me right, but one can prove me wrong’. We can’t prove a hypothesis true, but we can prove its falsehood.[7] [8]




[1] Particularly in Social sciences.

[2] Bayesians would likely disagree this rationale is useful and argue that testing against Ho = 0 yields very little information. Instead, Bayesians use Bayes' theorem to update the probability for a hypothesis as more evidence or information becomes available.

[3] The approximate formula for the confidence interval of proportions should only be used when the population size is 20 times larger than the sample size.

[4] Confidence level:  γ = (1 − α). This is the probability of not rejecting the null hypothesis given that it is true. Confidence levels and confidence intervals were introduced by Neyman in 1937.

[5] In practice, this is closer to Fischer's ideas than Neyman-Persons's.

[6] In the future, I will include a meta-analysis of these studies to showcase the importance of cumulative and open-science practices.

[7] In the future, I will include a thought experiment showing how p-values are counter-intuitive and completely backward from what a scientist generally wants.

[8] Soon I will include the R-code and link to a R-markdown document with of all these calculations on GitHub.

Learning Statistics on Youtube by Flavio Azevedo is the second most accessed website in the world (surpassed only by its parent, It has a whopping 1 billion unique views a month. [1, 2] It is a force to be reckoned with. In the video sharing platform, there are many brilliant and hard-working content creators producing high-quality and free educational videos that students and academics alike can enjoy. I made a survey on Youtube content that could be useful for those interested in learning Statistics, and I listed and categorized them below.

Truth be told, this post is a glorified Google search in many respects. In any case, I had intended for a long time to gather this information as to facilitate the often laborious task of finding pertinent resources for learning statistical science in a non-static format (i.e., videos) that is easily accessible, high-quality, instructive and free. 

Another motivation had to do with my teaching obligations. In this fall, I will teach a graduate course in Stats with R. To this end, I considered becoming a content creator myself, as to allow students to access the course's content from the convenience of their homes. In this process, I found some excellent statistical courses on Youtube. Some were really useful in terms of their organization, others in terms of content, interesting explanations, pedagogical skills, availability of materials, etc. Altogether, searching for resources was a very instructive experience, whose fruits should be shared.

Importantly, in this process, I learned that youtube is not short of 'introductory course on ___.' Not of Statistics, Probability or R, anyways. Which is a good thing. And often, you even see these three together. Also in abundance, are courses on the ABC's of probability theory, classical statistics (i.e., up to ANOVA, ANCOVA), and on basics of applied statistics (e.g., Econometrics, Biostatistics, and Machine Learning). Indeed, Machine Learning (mostly through Data Science) is really well represented on Youtube.

Due to the sheer amount of channels, I organized them into three broad categories: use of R as statistical software, use of other statistical software, and lecture format only. I also listed each channel's content/topic, whether authors provided slides, code, additional materials online (with links), and relevant remarks.

1. Learning Statistics with R

Youtube channelContentSoftware Online Materials? Remarks
Mike Marin [Intro] Basic Stats in R R Yes, good materials University British Columbia
Michael Butler [Intro] to R and Stats, Modern R Yes Good intro to R + Exercises
EZLearn [Intro] Basic Stats in R R Exercises w/ solutions -
Renegade Thinking: Courtney Brown [Intro] Undergraduate Stats R Yes Good Lectures
Barton Poulson [Intro] Classical Stats, Programming & Solved Exercises R, Python, SPSS Yes Gives intro to Python, R, SPSS and launching an OLP
Ed Boone [Intro] Basic R and SAS R & SAS Yes -
Bryan Craven [Intro] Basic Stats in R R - [Intro] Basics of R and Descriptives R Yes OLP
Bryan Craven [Intro] Basic Stats in R R No -
Laura Suttle (Revolution) [Intro] R tour for Beginners R No -
Phil Chan [Intro] Classical and Bio-stats R, SPSS, Eviews No -
Gordon Anthony Davis [Intro] R Programming Tutorial R No Thorough intro for beginners
Nathaniel Phillips [Intro] R Programming Tutorial R Yes Videos as a pedagogical tool for his R book
David Langer [Intro] Basics of R R No Excellent pedagogical Skills
MrClean1796 [Intro] Math, Physics and Statistics, lecture & R R No -
Brian Caffo Advanced & Bio-Stats, Reproducible Research R Yes, Coursera and GitHub Professor of Bio-statistics, Johns Hopkins Univ.
Abbass Al Sharif In-depth Machine Learning R Yes Excellent lectures and resources
James Scott Advanced Stats R Yes, and GitHub Several Course Materials on GitHub
Derek Kane Machine Learning R Yes Excellent Videos, Fourier Analysis, Time series Forecasting
DataCamp Programming, DataViz, R Markdown [free] R Yes, paid. 9$ for students -
Maria Nattestad DataViz R Personal Website Plotting in R for Biologists
Christoph Scherber Mixed, GLM, GLS, Contrasts R Yes -
Librarian Womack Time Series, DataViz, BigData R Yes, Course and .R Materials online
Jarad Niemi R Workflow, Bayesian, Statistical Inference R Yes -
Justin Esarey Bayesian, Categorical and Longitudinal Data, Machine Learning R Yes, lots and lots Political Scientist
Jeromy Anglim Research Methods R Blog:Psych & Stats, GitHub + Rmeetups and Notes on Gelman, Carlin, Stern, and Rubin
Erin Buchanan Under- & post-graduate Stats, SEM R, G*Power, Excel Yes Excellent pedagogical strategies
Richard McElreath From Basic to Advanced Bayesian Stats R and Stan Yes, lots Book lectures
edureka Data Science R, Hadoop, Python Yes, online learning plattaform R Intro w/ Hadoop [free]
Learn R R programming, stats on webiste R, Python Yes, and One R Tip A Day On website, lots of starter's code
Data School Machine Learning, Data Manipulation (dplyr) Python, R Yes, dplyr Machine Learning with Hastie & Tibshirani
Econometrics Academy Statistics (via Econometrics) R, STATA, SPSS Yes OLP, Excellent Materials and Resources
Jalayer Academy Basic Stats + Machine Learning R, Excel No Also Lectures
Michael Levy Authoring from R, Markdown, Shiny R No -
Melvin L. Machine Learning, R Programming, PCA, DataViz R, Python, Gephi No Interesting Intro for Spark
OpenIntroOrg Intro to Stats/R plus Inference, Linear Models, Bayesian R Yes, Coursera and OpenIntro Coursera Courses, Resources in SAS
Mike Lawrence Tidyverse, Wrangling & DataViz, plus Bayesian Inference R, Stan, rstan Yes, w/ Lit too And relevant repos on GitHub

2. Learning Statistics with other software

Youtube channelContentSoftware Online Materials? Remarks
Jonathan Tuke Basic Stats Matlab No -
Saiful Yusoff PLS, Intro to MaxQDA SmartPLS, MaxQDA Yes BYU
James Gaskin SEM, PLS, Cluster SPSS, AMOS, SmartPLS Yes BYU
Quantitative Specialists Basic Stats SPSS No Upbeat videos
RStatsInstitute Basic Stats SPSS No Instructor at Udemy
how2stats Basic Stats, lecture and software demonstrations SPSS Yes Complete Classical Stats
BrunelASK Basic Stats SPSS -
The Doctoral Journey Basic Stats SPSS Yes -
StatisticsLectures Basic Stats, lecture format SPSS Yes discontinued, but thorough basic stats
Andy Field Classical Stats, lecture and software demonstrations SPSS Yes, registration needed Used heavely in Social Sciences
Quinnipiac University:Biostatistics Classical Stats SPSS No -
The RMUoHP Biostatistics Basic and Bio-Stats SPSS, Excel No -
PUB708 Team Classical Statistics SPSS, MiniTab No -
Professor Ami Gates Classical Stats SPSS, Excel, StatCrunch Yes -
H. Michael Crowson Intro and Basic Stats in several Softare SPSS, STATA, AMOS, LISREAL Yes? -
Math Guy Zero Classical Stats + SEM SPSS, Excel, PLS No Lots of materials
BayesianNetworks Bayesian Statistics, SEM, Causality BayesianLab Yes -
Khan Academy Programming 101 Python Yes -
Mike's SAS Short intro to SAS, SPSS SAS, SPSS No -
Christian A. Wandeler Basic Stats PSPP No -

3. Lectures on statistics

Youtube channelContentSoftware Online Materials? Remarks
Stomp On Step 1 [Intro] Bio-Stats, Basic Lectures Yes USMLE
Khan Academy [Intro] Basic Stats, lecture format Lectures Yes -
Joseph Nystrom [Intro] Basic Stats Lectures Yes Active & unorthodox teaching
Statistics Learning Centre [Intro] Basic Stats Lectures Yes Register to access materials
Brandon Foltz [Intro] Basic Stats Lectures soon Excellent visuals
David Waldo [Intro] Probability Theory Lectures No -
Andrew Jahn [Intro] Basic Stats Lectures No FSL, AFNI and SPM [Neuro-immaging]
Professor Leonard [Intro] Stats and Maths Lectures No Excellent pedagogical skills
ProfessorSerna [Intro] Basic Stats Lectures No -
Harvard University [Intro] Thorough Introduction to Probability Theory and Statistics Lectures No In-depth
Victor Lavrenko Machine Learning, Probabilistic, Cluster, PCA, Mixture Models Lectures Yes, very complete Excellent Content, and lots of it
Jeremy Balka's Statistics Graduate-level Classical Stats, Lecture Lectures Yes, very thorough Excellent altogether, p-value vid great!
Methods Manchester Uni Discussion on a wide variety of methods, SEM Lectures Yes Methods Fair
Steve Grambow Series on Inference Lectures Yes Great Lectures on Inference [DUKE]
Statistics Corner: Terry Shaneyfelt Statistical Inference Lectures Yes from a clinical perspective
Michel van Biezen Complete Course of Stats Lectures Yes, 1, 2, 3 Thorough and complete, plus Physics and Maths
Oxford Education Bayesian statistics: a comprehensive course Lectures Yes -
Nando de Freitas Machine Learning Lectures Yes, also here and here -
Alex Smola Machine Learning Lectures Yes, slides and code -
Abu (Abulhair) Saparov Machine Learning Lectures Yes Taught by Tom Mitchell and Maria-Florina Balcan
Geoff Gordon Machine Learning, Optimization Lectures Yes -
MIT OpenCourseWare Probability Theory, Stochastic Processes Lectures Yes, here, and here -
Alexander Ihler Machine Learning Lectures Yes, along w/ many others classes -
Royal Statistical Society Important Statistical issues Lectures Yes Interesting topics
Ben Lambert Graduate and Advanced Stats Lectures No Asymptotic Behaviour of Estimators, SEM, EFA
DeepLearning TV Machine (and Deep) Learning Lectures No Excellent pedagogical skills
Mathematical Monk Machine Learning, and Probability Theory Lectures No -

Final Remarks

These collection of channels listed here are not supposed to be exhaustive. If I have neglected a youtube channel that you think should figure in this list, please let me know via the contact form below and I shall include it. Thank you very much!

postscriptum  [21/09/2016]

I am delighted with the reaction this post received on social media, which is mainly due to being published on R-bloggers (Thank you Tal Galili) As of today, it has received 450 shares and 500 likes [from both here and here].