pftq.com
Blabberbox » Essays » Data Does Not Equal FactShare on Twitter

Data Does Not Equal Fact

January 24th, 2018 | Posted by pftq in Essays | #
"Data!=Fact: Inductive vs Deductive Reasoning"

Part of my frustration with all the focus on big data, statistics, quant, and numbers in general is that these are all forms of inductive reasoning.  Amidst all the hype around being more data driven, we seem to have to forgotten that inductive reasoning only provides an estimate of world.  It does not and cannot prove anything.  Evidence alone does not provide truth.  Data does not equal fact.

There are two main forms of reasoning:  inductive and deductive.  Inductive reasoning is the process of generalizing from a few observations to a general conclusion.  Deductive reasoning is the process of understanding what must be true and apply to something more specific given a premise or broader assumption.  The former is emphasizes correct observation while the latter emphasizes actual logical flow and reasoning, the how and why as opposed to just what it is you are seeing.

Inductive Reasoning
https://en.wikipedia.org/wiki/Inductive_reasoning
Inductive reasoning is backwards looking and by definition says nothing of the future.  You start from seeing one thing and then try to generalize to something you haven't seen before, but you can never be fully sure.  The easiest way to identify it is it almost always depends on evidence, concrete numbers, quantitative data, statistical sampling - stuff that you can measure.  It is useful for describing many things together, giving a big picture from high up of what's happening on the ground, but people seem to forget that no matter how large the sample size is, it never is absolute.  Everything always boils down to a probability as opposed to a cause and reaction.  Even in the most introductory statistics class, the teacher tells you correlation is not causation, and yet we now have phrases like "social validation" (literally validation by what most people think), "empirically proven" (by quants and data scientists with PhDs no less), or "statistically impossible" (can't happen now just because it hasn't happened before).  The problem is when we start trying to rely solely on numbers to tell us what's true.  The example used in the Wikipedia article above very much reflects the kind of thinking I've been running into in Silicon Valley and Wall Street - "if all life we see is like this, any new life we find will probably be the same."  In Silicon Valley, the thought is that if the data suggests a conclusion with high enough statistical significance, then it must be true unconditionally.  They won't even look at you if you don't have data to back your conclusion, if your ideas are not "data driven."  Similarly, in Wall Street, the thought is that if the biggest names or majority of people are doing things one way, it must be the right way.  Things are true just because they work and have always worked.  All this is in contrast to deductive reasoning where the manner of thinking is to actually prove things true or arrive at absolute certainty in conclusions (as opposed to 90% chance of something).

Deductive Reasoning
https://en.wikipedia.org/wiki/Deductive_reasoning
Deductive reasoning is more abstract (read: harder) and less about the numbers, if at all.  It drills down and keeps asking why at every level, on the ground as opposed to from a distance.  It is forward-looking, in that you pro-actively think out how something *must* work rather than just observe that something *does* work.  An example is Gell-Mann and Zweig [#] figuring out that the quark *must* exist years before we actually observed and discovered the particle.  Galileo, Tesla, Einstein, and many other scientists are famous for having come to their ideas through sheer mental simulations and thought experiments rather than physically observing or testing.  No amount of data would have gotten you General Relativity.  Discrete math, algorithms, combinatorics, and other theoretical subjects all rely on deductive reasoning.  Look at any real math proof, and it may not even use numbers, let alone evidence or observation, but will tell you in absolute terms what is or isn't true (see Real Math [#]).  My own algorithms and code are often this way, with just pure logic and no numbers or calculation (see Lossless Algorithms [#]).  This is the stuff you get an A on for a mere 50% score at UC Berkeley because so many people either suck at this in general or are gradually losing this skill.  Some don't even believe it exists because they can't see "evidence" of it or they believe that "nothing is absolute."  I've had people literally try to argue that even math only works because everyone agrees on it, that 1+1 only equals 2 because everyone believes it does [#], and that it's just as susceptible to being "right until it's wrong."  All this ties in closely with critical thinking.  The two are like opposite sides of the same coin.  Whereas deductive reasoning is employed to see if what's being said makes sense, critical thinking is employed to understand why something is being said - aka context and story.  One could say critical thinking is essentially the application of deductive reasoning to communication, and it is just as easily falling out of favor to more inductive reasoning styles for similar reasons of not being quantifiable or concrete (think standardized testing).  It is the crux of the issue where students merely memorize the answers from the textbook as opposed to understanding why the answers are true, how to arrive at them without structure or having seen them before.

Some will undoubtedly ask, how do you know something *must* work or *why* something works? How do you know what's logical? I could go through a whole list of logical fallacies [#] to avoid, but the gist of it is that any reason you come up with should not create a circle back to itself (aka infinite loops).  Pretty much any logical fallacy can be identified by asking why and realizing if the reason leads you in a circle (see chart below).  This is what happens with inductive reasoning because at the end of the day, you're only describing what you see; it's observation piled upon observation without explanation or looking beneath the surface.  With deductive reasoning, you keep going deeper and deeper down the cause-and-effect chain, which may not necessarily ever end but being able to move from point to point without going in circles, without contradicting yourself, is what logic is about.  It doesn't automatically mean something is true, but if you're wrong, you know it's from the observations/evidence, which then is the right place for inductive reasoning to come in.  It's the difference between figuring things out by luck and guessing vs actually understanding cause and effect.


Inductive vs Deductive Reasoning
Inductive reasoning describes what *is*. Deductive reasoning explores what *if*, the how and why. Inductive reasoning is useful to measure the accuracy of one's observations, but it is deductive reasoning that draws the conclusions.  Put in simple terms, you use inductive reasoning to gather your information, but you use deductive reasoning to understand and interpret it.  The data gives you eyes, but you've got to use your brain to make sense of it. Unfortunately, more and more it seems we have forgotten the premises of these concepts and blindly lump them together as equal approaches to logic, almost as if it's just a matter of preference for what the user is comfortable with when one is actually completely useless without the other.

This bleeds into everything from data science to artificial intelligence (much of which is just really automated data science and not really intelligent or learning at all).  There seems to be this general assumption that all algorithms, including artificial intelligence, have to revolve around data and numbers.  Yet, take for example the following algorithm I created here [#] for creating support and resistance lines in trading.  It in fact uses no statistical analysis at all, but nonetheless people just assume it does, that something automated, technical, or machine must be data driven or quantitative (ironically this assumption is itself a result of inductive reasoning).  It is like card counting to know what is exactly in the deck, while everyone else is just looking for outcomes that repeat.  And then everyone just assumes you must be better at crunching numbers or spotting patterns, when that is not what you're doing at all.

Other pet peeves for me from inductive reasoning essentially amount to misuse of statistics.  Especially in tech and finance, there is this bizarre assumption that an anomaly will never happen, and if it does, it somehow does not count.  They'll point to the 99 other times things succeed, but they don't seem to fully grasp that they only get to live in one life, that the outcome is discrete.  You see decisions often based on "expected value," where they'll take the decision with the highest weighted average of outcomes, even if it includes the potential outcome of death - but they don't care because the probability is so small that it’s essentially zero (or "statistically impossible").  It's similar to the misunderstanding of randomness [#] I've described before, where just because they don't know something means it's okay to assume it's random.  The biggest problem is the lack of back-up plan, of any acknowledgement whatsoever that the improbable could happen.  The entirety of Blockchain, for example, is based on no two people ever drawing the same random number for a wallet, even though there's nothing that says they can't and nothing that even checks if they do.  The worst excuse is that the odds are so low that it would take a billion years to happen, but the last I checked, that's not how probability works.  It could happen tomorrow, and that would be the end of it.  Again, they ignore that you only get to live with one discrete outcome, and once something happens, it's done, it's over.  You don't get to live in the weighted average of the possibilities.  That's not a real number.  It's literally made up.

Similarly, there seems to be a growing trend where people simply can't recognize what's right in front of them unless they get a thousand other opinions or datapoints first to confirm it.  Their own house could be on fire, but they'd sooner check Twitter than look with their own eyes.  Worse is when they straight-up deny what's in front of them if everyone else tells them it's impossible.  It's like taking too literally the phrase "two heads are better than one" when what you're really doing is just shutting off your own brain.  The purpose of inductive reasoning and statistics overall is to make it easier to think about large amounts of data, to make estimates about a collection of items and summarize many outcomes into just a few numbers.  In other words, it's for zooming out to the big picture and thinking about what a hundred things will do overall or where things are headed in general as a trend.  It's the wrong use case when you try to drill down and apply it to just that one thing in front of you.  It's the same mistake as thinking that just because p=>q, it means you can flip it and do q=>p (fallacy of affirming the consequent [#]).

A great quote that illustrates all this comes from the movie "I Origins [#]":
"If I drop a phone 1000 times and one of those times it doesn't fall, that is the one datapoint that matters." (Paraphrased)

If you only used inductive reasoning, you'd need to be hit by a car 20 times before you know it hurts.  If you used deductive reasoning, you'd know not to be hit in the first place.  It's the difference between those who only learn through trial-and-error vs those who get it right the first time around, the ability to improvise, strategize, predict, essentially be creative.  And no it's not luck or selection bias.  We live this everyday and have to keep thinking about how to make our outcomes happen.  You don't get to redraw your life from a hat.
184 unique view(s)

Responses

  1. Aaron said,
    Jan-25-2018, 01:54pm

    Hm. Do you know of anything in current AI that's deductive? Nothing in machine learning, I believe.

    Have been deeply studying reinforcement learning lately. I wonder what learning a deductive policy would look like.

  2. pftq said,
    Jan-27-2018, 08:49am

    I believe my approach here would lead to one: https://www.pftq.com/blabberbox/?page=Creating_Sentient_Artificial_Intelligence

    But it would be a significant undertaking to build and implement.  I haven't found the time nor resource to attempt it.  And it's been difficult to find anyone to help that actually understands, can figure out the details/intricacies, and isn't dismissive of it from the outset (because it's not empirical).

  3. Aaron said,
    Jan-29-2018, 08:07am

    Interesting ideas... lot of useful distinctions. Seems like it's moving in the right direction, at a minimum.

    Thanks for pointing that article out. Am pondering on it, and how to apply it.

  4. bx549 said,
    Mar-30-2018, 05:50pm

    Nice thoughts and summary of trends in data science and the requirement that all decisions be data-driven. I have one quibble about your statements that outcomes are discrete and that we don't get to live in a world where the possible outcomes are weighted averages. What you say is true, but the whole of utility theory says that it's OK to make decisions based on expected value. The outcome is like a lottery; you get one of the "prizes". Your preference for a "prize" is measured by the risk you are willing to take to receive it. I want to use my crypto-currency wallet to store my coins and sometimes buy things. A collision is possible, but as a practical matter I'm willing to take that risk. Similarly, an asteroid could hit my car while I'm driving to work, but I make the decision to go anyway. Well, I agree with you that deductive thinking is (unfortunately) losing favor.

  5. pftq said,
    Jun-19-2018, 08:45am

    It's less about whether you still make the decision and more that you understand you are guessing and should have a backup plan.  My criticism is mainly of people who put everything on the line because the "expected value" is so high, then have no fallback plan when the "statistically impossible" happens.

Leave a Comment

Name: (Have an account? Login or Register)
Email: (Won't be published)
Website: (Optional)
Comment:
Enter the code from image: