Saturday, September 3, 2022

Creation Moment 9/4/2022 - PCA Debacle & Cherry-Picking

Where is the wise? where is the scribe? where is the disputer of this world? hath not God made foolish the wisdom of this world? 1 Corinthians 1:20
 
"Q: Do computers remove the bias from evolutionary findings? 
A: Software analysis is no better than the assumptions built into it. And when the input data is flawed, too, none of what it outputs is trustworthy. Prepare for a bombshell announcement that undermines hundreds of thousands of studies. If the author is right, all those studies will have to start over.
 
Study reveals flaws in popular genetic method (Lund University, 30 Aug 2022). Dr. Eran Elhaik, Associate Professor in molecular cell biology at Lund University, has just pulled a big rug out from under nearly 60 years’ worth of genetic studies. If they used a method called Principle Component Analysis (PCA), they’ve all been cast into doubt. The bad news begins in large font:
The most common analytical method within population
genetics is deeply flawed
,
according to a new study from Lund University in Sweden. This may have led toincorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting results within medical genetics and even commercial ancestry tests. The study is published in Scientific Reports
.
Just how many studies are tainted by the flaws Dr Elhaik’s discovered?
Between 32,000 and 216,000 scientific articles in genetics alone have employed PCA for exploring and visualizing similarities and differences between individuals and populations and based their conclusions on these results.
The source paper is open-access for all to read and gasp at the implications. Elhaik, “Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated,Nature Scientific Reports volume 12, Article number: 14683 (2022).
Principal Component Analysis (PCA) is a multivariate
analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data
.
PCA has been the “gold standard” in widely-used Genome-Wide Association Studies (GWAS). When the technique was automated by software, many researchers simply trusted it. Pour in the data and out came the results, nice and tidy in colorful graphs ready for publication. Now, though, Elhaik is showing that researchers could get any results they wanted by manipulating the data. The results are not real. They are artifacts of the investigator’s assumptions!
We demonstrate that PCA results can be artifacts of the
data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed
.
Population Genetics studies figure large in evolutionary claims. Does this bombshell affect evolutionary studies that have been peer-reviewed, published and relied upon by other evolutionists? Yes, and the errors permeate all kinds of studies.
  • PCA serves as the primary tool to identify the origins of ancient samples in paleogenomics, to identify biomarkers for forensic reconstruction in evolutionary biology, and geolocalize samples.
  • PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness.
  • It is used to examine the population structure of a cohort or individuals to determine ancestry, analyze the demographic history and admixture, decide on the genetic similarity of samples and exclude outliers, decide how to model the populations in downstream analyses, describe the ancient and modern genetic relationships between the samples, infer kinship, identify ancestral clines in the data, e.g., Refs.16,17,18,19, detect genomic signatures of natural selection, e.g., Ref.20 and identify convergent evolution.
Q: How could so many scientists be fooled? 
A: Perhaps because it was so easy to let the computer do the work. Doing analysis the hard way became intolerably tedious as datasets grew.....So easy. Run the program. Just believe.
PCA’s widespread use could not have been achieved without several key traits that distinguish it from other tools—all tied to the replicability crisis. PCA can be applied to any numerical dataset, small or large, and it always yields results. It is parameter-free and nearly assumption-free. It does not involve measures of significance, effect size evaluations, or error estimates. It is, by large, a “black box” harboring complex calculations that cannot be traced.
----The software is a charlatan that pretends to know the answer to any question. Place your bet and pull the crank, and Voila! Science!
Instead, our examples show how PCA can be used to
generate conflicting and absurd scenarios
, all mathematically correct but, obviously, biologically incorrect
and cherry-pick the most favorable solution.
This is an example of how vital a priori knowledge is to PCA. It is thereby misleading to present one or a handful of PC plots without acknowledging the existence of
many other solutions, let alone while not disclosing the proportion of explained variance.
Elhaik essentially condemned many researchers as quacks. 
---Whether consciously or not, they have misled the public by outsourcing their bias to a pseudo-objective tool. What they actually demonstrated is Richard Feynman’s remark, “Science is the belief in the ignorance of experts.
Trusting the PCA method became even more tempting with each new upgrade.
There are no proper usage guidelines for PCA, and
“innovations” toward less restrictive usage are adopted quickly.
Recently, even the practice of displaying the proportion of variation explained by each PC faded as those proportions dwarfed.
Since PCA is affected by the choice of markers, samples, populations, the precise implementation, and various flags implemented in the PCA packages—each has an unpredictable effect on the results—replication cannot be expected.
Replicability is supposed to be one of the hallmarks of science. Elhaik, concerned about the “replication crisis” in science, decided to check on this widely-used analysis tool. What he found was rampant circular reasoning and a priori assumption.
To illustrate the way PCA can be used to support multiple opposing arguments in the same debate, we constructed fictitious scenarios with parallels to many investigations in human ancestry that are shown in boxes. We reasoned that if PCA results are irreproducible, contradictory, or absurd, and if they can be manipulated, directed, or controlled by the experimenter, then PCA must not be used for genetic investigations, and an incalculable number of findings based on its results should be reevaluated. We found that this is indeed the case.
 Ouch! Evolutionary Pop-Gen." CEH