Media Bias

Media Bias

Online penetration and growth of online content is increasing at a rapid rate, 46% penetration and 10% y-o-y growth as of 2016. However, as seen in recent news (e.g., Facebook’s interference with US elections), users are being subject to biased/non neutral information. Only 50% of U.S. adults feel confident there are enough sources to allow people to cut through bias in the news (down from 66% a generation ago). Therefore, it is critical to enable users and consumers of content to access neutral unbiased information so that they can make decisions based on true facts.

Online penetration and growth of online content is increasing at a rapid rate, 46% penetration and 10% y-o-y growth as of 2016. However, as seen in recent news (e.g., Facebook’s interference with US elections), users are being subject to biased/non neutral information. Only 50% of U.S. adults feel confident there are enough sources to allow people to cut through bias in the news (down from 66% a generation ago). Therefore, it is critical to enable users and consumers of content to access neutral unbiased information so that they can make decisions based on true facts.

Every news story or article is stained by the bias of an author’s experiences, judgements and predispositions. As humans, we communicate emotional statements as well as states of the world, and the way we choose to say things can often influence user experience. Since the 2016 presidential election, the number of searches for “Fake News” has increased ~25 fold. The US media ranks worst in the world and surprisingly, half of Americans believe that online news websites report fake news regularly. For this reason, we wanted to identify and prevent the spread of misinformation online by quantifying bias, working closely with the AI for Good Foundation. We proposed a model that predicts articles’ “bias score” based on selected features. The score encourages readers
to be more critical and attempts to mitigate the negative effects of bias.

Every news story or article is stained by the bias of an author’s experiences, judgements and predispositions. As humans, we communicate emotional statements as well as states of the world, and the way we choose to say things can often influence user experience. Since the 2016 presidential election, the number of searches for “Fake News” has increased ~25 fold. The US media ranks worst in the world and surprisingly, half of Americans believe that online news websites report fake news regularly. For this reason, we wanted to identify and prevent the spread of misinformation online by quantifying bias, working closely with the AI for Good Foundation. We proposed a model that predicts articles’ “bias score” based on selected features. The score encourages readers
to be more critical and attempts to mitigate the negative effects of bias.

ABOUT

Media Bias is part of an AI4Good project series, between the AI for Good Foundation and the Applied Data Science with Venture Applications Course at SCET, UC Berkeley.

PROBLEM

Media bias and lack of access to neutral information.

How might we leverage ML to reduce bias and more effectively present search results on specific topics?

MISSION

Make the world a more trustful place through AI.

APPROACH

Manual scoring

Manual evaluation of over 240 news articles from different sources, from 0 to 5 (5 – unbiased/factual, 0 – completely biased).

Visualisation strategy

Determine features of interest to forecast the score of any article. Case study of a corpus of articles

Model

DATA MODEL ARCHITECTURE

Features

  • 3 manual scores

  • Alexa’s country ranking of the source

  • Subjectivity score of the title (NLTK)

  • # of articles published by the source

  • # of other sources mentioned in the body

Visualisation strategy

Data cleaning – NumPy and Pandas

Modelling
Training set: 80%, Random forest: 500 trees, Cross-validation: 4-fold

APPROACH

Manual scoring

Manual evaluation of over 240 news articles from different sources, from 0 to 5 (5 – unbiased/factual, 0 – completely biased).

Visualisation strategy

Determine features of interest to forecast the score of any article. Case study of a corpus of articles

Model

DATA MODEL ARCHITECTURE

Features

  • 3 manual scores

  • Alexa’s country ranking of the source

  • Subjectivity score of the title (NLTK)

  • # of articles published by the source

  • # of other sources mentioned in the body

Visualisation strategy

Data cleaning – NumPy and Pandas

Modelling
Training set: 80%, Random forest: 500 trees, Cross-validation: 4-fold

Outputs

RESEARCH TEAM

DATA – X 

ANA LUIZA FERRER

Researcher

ASEF ALI

Researcher

DAVID HUH

Researcher

HUGO ROUCAU

Researcher

MYRIAM AMOUR

Researcher

SHIVAM MISTRY

RESEARCHER