Quant Psych
Quant Psych
  • 172
  • 854 315
The most common mistake in biostatistics
Want to take a class with me? Visit simplistics.net and sign up! See you there!
*Technical side note: zero-inflated simply means you have a distribution with an abnormal number of zeroes. To model zero-inflated data, it's common to use models that combine logistic regression (to predict the 0 vs 1) and logistic regression (to predict 1+). So it's technically incorrect to say that a distribution that has this characteristic (combining yes/no with degree) is "zero-inflated." In other words, zero-inflated describes the distribution, not how the distribution is modeled. Make sense?
Переглядів: 2 581

Відео

Testing normality is pointless. Do this instead
Переглядів 7 тис.Місяць тому
Do you want more structured and personalized information? Come take a class with me! Visit simplistics.net and sign up for self-guided or live classes. Video about diagnostics: ua-cam.com/video/jd7x-ww7da4/v-deo.html Video about robustness: ua-cam.com/video/bHmyMlZ0ODg/v-deo.html And here's the paper (and dataset) I referenced in the video: journals.plos.org/plosone/article?id=10.1371/journal.p...
Visual Partitions
Переглядів 1,2 тис.2 місяці тому
Do you want more structured and personalized information? Come take a class with me! Visit simplistics.net and sign up for self-guided or live classes. This series of videos is based on my visual partitions paper: osf.io/preprints/psyarxiv/avu2n And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ You can find part 1 of this video here: ua-cam.com/video/...
Building Statistical Models From Visuals (Part II)
Переглядів 1,4 тис.2 місяці тому
Do you want more structured and personalized information? Come take a class with me! Visit simplistics.net and sign up for self-guided or live classes. This series of videos is based on my visual partitions paper: osf.io/preprints/psyarxiv/avu2n And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ You can find part 1 of this video here: ua-cam.com/video/...
How to Visualize Data (Part 1)
Переглядів 1,9 тис.2 місяці тому
Do you want more structured and personalized information? Come take a class with me! Visit simplistics.net and sign up for self-guided or live classes. Additional Resources: ua-cam.com/video/OlbAFVbmieQ/v-deo.htmlsi=pRYRPt_3o_mlakIB ua-cam.com/video/AK1iZyY6lMo/v-deo.htmlsi=4xVfls_wAIrlIj70 ua-cam.com/video/SqN-qlQOM5A/v-deo.htmlsi=uEaGle5zd0OiWLNM ua-cam.com/video/wSPN2qwgXWU/v-deo.htmlsi=p0Rn...
It's lonely being a statistician :(
Переглядів 2 тис.2 місяці тому
Sign up for simplistics curriculum here: simplistics.net Some other links that might be interesting.... Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curriculum playlist (GLM-based appro...
Dealing with nonlinear data: Polynomial regression and log transformations
Переглядів 2 тис.3 місяці тому
Come take a class with me! Visit simplistics.net Here's the video on transformations: ua-cam.com/video/d8QIQwr762s/v-deo.html Here's the video on diagnostics plots: ua-cam.com/video/jd7x-ww7da4/v-deo.html Here's the video on poisson regression: ua-cam.com/video/WuWGXJvTda4/v-deo.html Here's the videos on generalized linear models: ua-cam.com/video/SqN-qlQOM5A/v-deo.html Here's the link to the b...
Is pre-registration pointless? My talk to the Royal Society
Переглядів 9153 місяці тому
Link to my replication crisis video: ua-cam.com/video/76Gu2reKp_A/v-deo.html Here's the paper this was based on: osf.io/preprints/psyarxiv/5vfq6 Come take a class with me! Sign up at simplistics.net My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curricul...
Take a class with me at simplistics.net!
Переглядів 5463 місяці тому
I'd love to meet you! I am offering live classes, as well as self-guided classes. Sign up at simplistics.net Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curriculum playlist (GLM-based ...
Want to take a stats class with me?
Переглядів 1,9 тис.Рік тому
Here's where to sign up: instats.org/seminar/introduction-to-simplistics-a-graphical1634 Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Graduate curriculum playlist (also LM-based approach): ua-cam.com...
Professor writes a love song to statistics
Переглядів 1,5 тис.Рік тому
See my other music video here: ua-cam.com/video/E0JzVIjj5wI/v-deo.html Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curriculum playlist (GLM-based approach): ua-cam.com/users/playlist?l...
Vlog: Do you want repeat content? A new song!
Переглядів 312Рік тому
Here's a link to my p-value song: ua-cam.com/video/E0JzVIjj5wI/v-deo.html Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curriculum playlist (GLM-based approach): ua-cam.com/users/playlis...
Are modern robust methods useless? (vlog)
Переглядів 1,5 тис.Рік тому
Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Multivariate playlist: ua-cam.com/play/PL8F480DgtpW9W-PEX0f2gHl8SnQ7PtKBv.html And here's a paper I wrote about my eight step approach to data analysis: psyarxiv.com/r8g7c/ Undergraduate curriculum playlist (GLM-based approach): ua-cam.com/users/playlist?list... Graduate curriculum playlist (also GLM-based approach): ua-cam.c...
Mixed Model Analysis: Real Example
Переглядів 7 тис.Рік тому
In this video, I'm analyzing the data from this paper: journals.plos.org/plosone/article?id=10.1371/journal.pone.0256001 I briefly mention the idea of visual partitions. It wasn't until after I recorded the video, I realized I'd never explained those on my channel. Here's a paper that explains the idea: psyarxiv.com/avu2n/ Link about EDA versus CDA: ua-cam.com/video/0SDVps4lg2s/v-deo.html My Mu...
Are nonparametric statistics useless?
Переглядів 2,7 тис.Рік тому
This is the video that started this incident: ua-cam.com/video/bHmyMlZ0ODg/v-deo.html Here's a link to the cross-validated question: stats.stackexchange.com/questions/554829/are-there-any-surveys-of-the-opinions-of-statisticians-on-the-usefulness-of-clas My favorite alternatives to nonparametric statistics are generalized linear models: ua-cam.com/video/SqN-qlQOM5A/v-deo.html I mentioned that I...
THIS is the foundation of statistics!
Переглядів 4,4 тис.Рік тому
THIS is the foundation of statistics!
Using Flexplot for Mixed Models
Переглядів 9 тис.Рік тому
Using Flexplot for Mixed Models
Fitting mixed models in R (with lme4)
Переглядів 24 тис.Рік тому
Fitting mixed models in R (with lme4)
How to identify your cluster variable in mixed models
Переглядів 3,3 тис.Рік тому
How to identify your cluster variable in mixed models
Renaming variables with dplyr in R
Переглядів 1,3 тис.Рік тому
Renaming variables with dplyr in R
Pivoting multiple variables: A simpler (more complex?) way
Переглядів 1,9 тис.Рік тому
Pivoting multiple variables: A simpler (more complex?) way
Wide to long format part 2 - Pivoting with multivariate data
Переглядів 1,9 тис.Рік тому
Wide to long format part 2 - Pivoting with multivariate data
Going from wide to long format in R - Pivoting to tidy data with the Tidyverse
Переглядів 2,7 тис.Рік тому
Going from wide to long format in R - Pivoting to tidy data with the Tidyverse
Advanced dplyr practice in R
Переглядів 1,6 тис.Рік тому
Advanced dplyr practice in R
Practice with dplyr in R
Переглядів 1,1 тис.2 роки тому
Practice with dplyr in R
Do we assume multicollinearity? No!
Переглядів 1,2 тис.2 роки тому
Do we assume multicollinearity? No!
Creating sum scales in R - Using mutate and across in dplyr
Переглядів 1,4 тис.2 роки тому
Creating sum scales in R - Using mutate and across in dplyr
Is this a controversial opinion about statistics?
Переглядів 2,5 тис.2 роки тому
Is this a controversial opinion about statistics?
Recoding and reverse coding variables in dplyr
Переглядів 3,3 тис.2 роки тому
Recoding and reverse coding variables in dplyr
How to use the pipe operator in dplyr
Переглядів 1,4 тис.2 роки тому
How to use the pipe operator in dplyr

КОМЕНТАРІ

  • @calenwu
    @calenwu День тому

    mf explains this in 6minutes while my computational statistics professor cant explain this in 2 lectures

  • @criticallyunderfunded3707
    @criticallyunderfunded3707 2 дні тому

    Hi, I have recently stumbled upon your great channel and have been binging your videos. To preface this, I am not anyone special. I am currently doing my Masters in Psych/Neuroscience and stats and stats communication is my side job. With that being said, I vehemently disagree with the perspective that everything should be taught as a linear model. Much of what we do in Psychology is linear modelling, but I found that it is simpler to teach things like the t-Test, the F-Test or the ANOVA independetly and then bring them all together into the little facets of the GLM. I usually frustrates me to no avail when people come to me for help and their professors have decided to just teach everything under the guise of a linear model. For them it just seems completly overwhelming. In principle, I get your argument. It would be better if people had a full understanding of the GLM or even the GLIM but purely didactically speaking this is just too much. Let me make an example. I can teach someone the basic of ANOVA in around 15 minutes. They do not need any special knowledge for my explanation. If they understand, I can teach them about the F-Test, which gives me a vehicle to repeat some things about the chi(squared) and the F distribution. If I want to later integrate it into the GLM, they will have heard the logic of ANOVA thrice (for one factor, for multiple factors and for ANCOVA) and the transition into the GLM is easier. If I teach the ANOVA purely through the GLM lense I need to do so much more preperation. My pupils would have to have a somewhat firm grasp about dummy coding, the basics of regression anslysis and the different kinds of linear model parameters. Then on top of this I have to explain the logic of ANOVA, but not the native logic of ANOVA. Instead I have to explain ANOVA in GLM which on it's own is (imo) a bit more difficult to explain than pure ANOVA. I could, of course, just explain ANOVA and then explain an ANOVA in GLM, but this is just the same thing that I initially outlines, just crammed into one session with more effort and less time spent on what ANOVA really does. Maybe as an aside, I think ANOVA is a brilliant procedure and gives a good introduction into quantifying questions that are not straight forward. However, in my experience, if ANOVA is being taught from the GLM perspective students get the impression that ANOVA is part of the GLM and falls to the same assumptions, which is just not true. ANOVA os exceedingly robust to almost all violations, which cannot be said of the GLM. I

  • @anangelsdiaries
    @anangelsdiaries 4 дні тому

    Great video, subscribed!

  • @RichmondDarko-qo2me
    @RichmondDarko-qo2me 7 днів тому

    Thank you very much for such informative videos. I spent several years in class and didn't understand all these concepts, but watching this video has made things easier for my comprehension. I have a few questions I would like to ask: When performing a statistical test, we use a parametric test if the data or variable in question is normally distributed, and a non-parametric alternative if the data or variable is not normally distributed. My question is: when does the central limit theorem come into play here? Also, a colleague of mine told me to always use parametric tests even if the data is not normally distributed. His explanation was that parametric tests are more powerful than non-parametric tests. So, should I straightforwardly use the non-parametric alternative when I observe that my data is not normally distributed, or should I take the CLT into consideration and use the parametric test?

  • @Cluless02
    @Cluless02 9 днів тому

    What advances have been made since Erich Fromm??

  • @idodlek
    @idodlek 9 днів тому

    Hello Mr. Fife 😀 Does, for example, running general linear model as t-test versus mann-whitney u test and comparing theirs results count as sensitivity analysis? Or only transformations, bootstraping and trimming would count as sensitivity analysis?

  • @goktugmk
    @goktugmk 15 днів тому

    You're amazing and you explain very clearly. Please keep making videos.😊

  • @stephenomenal1245
    @stephenomenal1245 15 днів тому

    This guy is so f*cking awesome! Very informative as well as great energy!

  • @fruithillfarm6113
    @fruithillfarm6113 16 днів тому

    Diagnostic criteria require an optimal cutoff. Those cutoffs are not arbitrary or determined by one dataset (the focus of researchers). Clinicians often conceptualize the data continuously (e.g., pre-diabetic, higher risk for cardiovascular disease, pre-clinical risk for stress-mediated chronic disease development), but patients want to know if they have a condition or not (category). Clinical scientists don't categorize everything because we only know how to use ANOVAs , but what a condescending standpoint. Eliminating categorical cutoffs eliminates diagnoses. I'm good with that, but really, as a patient, are you?

    • @galenseilis5971
      @galenseilis5971 12 днів тому

      Eventually mutually exclusive choices about whether or how to treat have to be made which induces some amount of discreteness.

  • @DistortedV12
    @DistortedV12 17 днів тому

    If all you know is ANOVA, what would you do instead?

    • @galenseilis5971
      @galenseilis5971 12 днів тому

      What alternatives are appropriate to ANOVA depends on the analysis problem; there isn't a one-size-fits-all approach. Saying that something isn't ANOVA is like saying something isn't a banana; it doesn't narrow things down very much. Start with the problem you want to solve and search for or develop the best method you can for it.

  • @dragcot9677
    @dragcot9677 17 днів тому

    as an ecologist in progrees I can say, in ecology EVERYONE is using GLM all the time even when they could be using other simpler methods so here I am trying to actually understand them ahjhahaha

  • @sprachenwelt
    @sprachenwelt 18 днів тому

    Or you could just drop it all and go fishing!

  • @qwerty11111122
    @qwerty11111122 19 днів тому

    Rowan University! I was in the first year of freshman to go all 4 years majoring in bioinformatics!! Edit: negative binomial mentioned 15:15

  • @nl7247
    @nl7247 19 днів тому

    Please also discuss the problems when categorical data are analysed as continuous data. Thank you for your videos.❤

    • @QuantPsych
      @QuantPsych 19 днів тому

      What problem? That's very common to do that. For example, male/female becomes 1/0 and we can use regression to do a t-test. Unless you mean something else?

    • @nl7247
      @nl7247 18 днів тому

      @@QuantPsychI mean if using continuous numbers to analyze categories, e.g., we don't really consider there could be 0.73 in the gender range when we use 1 or 0 (or 2) to represent only two genders (not getting to get into the recent gender classification discussion here). Or, something which should only be integers that making it continuous does not make sense in real world, although we often say or hear people have an average of 0.83 car... Thank you for your thoughts and reply.

    • @galenseilis5971
      @galenseilis5971 9 днів тому

      @@nl7247 One of the ways that models can be less realistic is to ignore the set of possible outcomes. If I have a count variable, e.g. Poisson, the expected value is not in general an observable outcome. That's okay if you are truly interested in the expected value. If you are not interested in the expected value then you should use something else like a distribution over the observables.

  • @tulipped
    @tulipped 19 днів тому

    Myanmese (or Burman, depending on who you ask).

    • @QuantPsych
      @QuantPsych 19 днів тому

      Excellent! I was hoping I'd get someone who knows :)

  • @Break_down1
    @Break_down1 19 днів тому

    1:04..or maybe we measure people who share the same gender. Why can’t I see a clear reason that “gender” is not a common candidate for nesting variable (ie people usually just control for it), but classroom always is?

    • @QuantPsych
      @QuantPsych 19 днів тому

      With gender we generally exhaust the categories we're interested (e.g., male, female, nonbinary). With classrooms we do not because we can't possibly sample all classrooms out there.

  • @hamidjess
    @hamidjess 19 днів тому

    This is a Nobel Price in Languages right here.

  • @shfizzle
    @shfizzle 19 днів тому

    i am not in anyway involved with doing statistics. i just love hearing a man be real about things.

  • @brazilfootball
    @brazilfootball 19 днів тому

    Thank you for this video!! Can you go into the differences between linear regression vs. the “decision tree” of tests in more detail? Is it a matter of pros and cons of methods or just old vs new techniques? One obvious thing that comes to my mind is one can’t account for repeated sampling with a t-test or ANOVA, right?

    • @QuantPsych
      @QuantPsych 19 днів тому

      I think this video will address that: ua-cam.com/video/KwVl_K_TLxo/v-deo.html

    • @brazilfootball
      @brazilfootball 18 днів тому

      @@QuantPsych Doesn't get much better than that! Thank you! 😅

  • @qwerty11111122
    @qwerty11111122 19 днів тому

    10:00 consider the bumblebee

    • @QuantPsych
      @QuantPsych 19 днів тому

      TIL about bumblebee languages :) Fascinating stuff!

  • @paulyoung3897
    @paulyoung3897 20 днів тому

    This was great

  • @pianofortissima4410
    @pianofortissima4410 20 днів тому

    Why does he shout the whole time? 😮

  • @swinginkeke
    @swinginkeke 20 днів тому

    Totally agree in theory, but docs love ORs and the Titanic turns slowly. How can I better communicate interpretability of betas if I keep the outcome continuous? “For each year older the kiddo is, we see delay to initial imaging increase by 1.6 days.” The blank stares haunt my dreams.

    • @QuantPsych
      @QuantPsych 19 днів тому

      True. Probably better to show them a plot.

    • @galenseilis5971
      @galenseilis5971 9 днів тому

      Are you equivocating the random variables with (conditional) expected values of the variables? They are not the same in important aspects for planning.

  • @1997aaditya
    @1997aaditya 20 днів тому

    Why don't you use poly(var_name, n) instead, for orthogonal polynomials?

    • @QuantPsych
      @QuantPsych 19 днів тому

      Because I can never remember how to do that.

  • @trini-rt6xn
    @trini-rt6xn 20 днів тому

    I'm not a Statistician or a Biostatistician, and I'm not even good at Math, but your explanation was so crystal clear even I can understand it. Sweet! And I've had Senior Level Management folk - VPs, SVPs - from major Big Pharma companies ask to keep hacking away at data that plain as daylight like the Continuous Variable Distribution you showed in this video, and I keep asking myself: "am I so stupid? Am I missing something obvious?" After all, the data is being summarized and showing whatever its showing, but somehow the big folks want it to show something else. And I'm always like "what else do you want it to show? It is what it is!" Of course, I swallow my pride and hide my impatience because maybe, just maybe, I'm really stupid. But after months of slicing and dicing data into invisible chunks, it always comes back to where I started. Scary! Thanks again for making advanced topics palatable for myself and others like me. It gives us hope.

  • @royals2013
    @royals2013 21 день тому

    “But previous literature did” hm ok yeah let’s shy away from that excuse

    • @QuantPsych
      @QuantPsych 20 днів тому

      Seriously!

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      What excuse? Fife cited a paper. What's wrong with that? If you read the paper and find problems with it that's one thing, but citing a source for a claim isn't an excuse as I understand it. Please elaborate.

    • @royals2013
      @royals2013 20 днів тому

      @@galenseilis5971 PI only wants to do something because previous literature did something. Statistics evolves, better practices emerge. Bad statistics are replicated far too often from people assuming the original methods are appropriate.

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@royals2013 I understand the context of your comment better. Thank you. Yeah, that's a really good point. A lot of literature has false claims in it, and that is something to have some vigilance about. Assuming uncritically that previous literature is *de facto* correct is unwise. It sounds like Fife has read the paper, albeit a substantial amount of time ago. If we want to further evaluate the paper that is up to us. In this context I think Fife is just making a claim with a citation. You're right that we should not take the conclusions of the paper at face value, but I don't think it is excuse-making to cite previous work as evidence for a claim.

    • @royals2013
      @royals2013 20 днів тому

      @@galenseilis5971 ah wasn’t meaning it as a quote from quantpsych btw lol just a quote that I hear a lot from PIs. Totally agree w everything said in the vid

  • @antoniobarros3415
    @antoniobarros3415 21 день тому

    As always, the vlog is excellent. It brings to mind a quote from Frank Harrell on categorisation: ‘Employ it when the intention is to mislead the reader" ;-)

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      I've enjoyed perusing Harrell's biostatistics book.

  • @McDreamyn_mdphd
    @McDreamyn_mdphd 21 день тому

    I've encountered that many tend to create categorical variables to use as predictors in logistic regression models, so that the value on the logit scale can be easily interpreted as an odds ratio. But what they don't realize is that the values can be recoded to keep the continuous distribution of the variable, but transformed it so that the value of 0 can indicate the value of say the bottom 25th percentile and the value of 1 can equal the value at the upper 25th percentile. Now in theory you are still interpreting the values as if they were a binary variable, but at least you do not lose statistical power by capping the natural variability of an informative covariate

    • @swinginkeke
      @swinginkeke 20 днів тому

      Can you walk me through a practical example of this? I’m a biostatistician at a hospital and or docs ALWAYS want odds, even at the expense of losing data/power/etc. I like this idea, but haven’t come across it before.

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      Hmm, I cannot say that I find this use case compelling. The canonical logistic regression is already clear enough to interpret as-is without further tinkering. Not that I think other categorizing strategies are appealing here either.

    • @antoniobarros3415
      @antoniobarros3415 20 днів тому

      @@swinginkeke probably, they should take the responsibility for the decision. They should have a look to DCA (decision curve analysis).

    • @McDreamyn_mdphd
      @McDreamyn_mdphd 20 днів тому

      @@swinginkeke as (Xi (Xi− X25th Percentile) / (X75th Percentile − X25th Percentile), where 0 = 25th percentile and 1 = 75th percentile for person i on variable X.

    • @McDreamyn_mdphd
      @McDreamyn_mdphd 20 днів тому

      @@galenseilis5971 Well, I agree, but for publication purposes in medical journals, there is often less interest in understanding a single unit by unit increase in the log odds of say some diagnostic test and instead there is a desire to transform the interpretation to something that is clinically meaningful. If I have a patient and I want to understand the dose-effect of a statin on an inflammatory marker (troponin), the transformation I outlined above is a very straightforward approach of translating the odds ratio in a very easy and understandable metric, especially for clinicians who may not necessarily be adept at reading medical literature. Over my career, I have learned that success it is less about what I know, and instead what I can do to demystify the numbers and make them clinically relevant to my peers in the publication process.

  • @yiannisspanos694
    @yiannisspanos694 21 день тому

    I used the book by Imbens and Rubin (2016) to measure treatment effects in my MSc thesis. It's a sub-classification, based on the Propensity score (PS), a continuous variable. The sample is split on the median such that the average PS of the treated is equal to the average PS of the controls in each stratum. The results are somewhat sensitive on how the sample is stratified, but the stratification is done using a very specific algorithm. I would be interested to hear your thoughts on that book. Note that the PS in my sample was analytically derived, not estimated.

  • @planetary-rendez-vous
    @planetary-rendez-vous 21 день тому

    I categorized my gene expression into low medium and high because we have no idea how to analyze something without resorting to pvalue comparisons (is there a difference of the mean, plug in whatever model you have depending on normality).

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      What mathematical and computational methods you should use will depend on the question you're trying to answer (assuming that your data can answer that question in principle).

    • @planetary-rendez-vous
      @planetary-rendez-vous 20 днів тому

      @@galenseilis5971 In our experiment we wanted to see if there is a "correlation" between a specific gene signature and the rest of the data, that is if we could sort out our tumor samples or explore any kind of pattern based on a specific gene signature expression which the tumor express. Anyway I'm not sure if that is a good idea but we did.

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@planetary-rendez-vous Well, such exploration of data could reveal interesting patterns. Naturally, if you find an interesting pattern you might next want to find out if it generalizes to future samples. Finding a pattern in a single sample is still only an early step towards an independently-verifiable phenomenon. And beyond a stable statistical pattern there are causal models. These come in different forms, but I recommend Judea Pearl's work on causality. I think his diagrammatic approach is a good entry point.

  • @PATRICKCONNOLLY-ub2vb
    @PATRICKCONNOLLY-ub2vb 21 день тому

    Medical doctors are indoctrinated to think of the world in terms of decision points and cutoffs. This is why the doctor demands discretization of a continuous response. He wants to have a decision point, where test results above that point indicate treatment. Continuous distributions are much more challenging to deal with. If you have a guy whose test score is in the middle of the pack, what do you do, give him half the treatment? What if the treatment is a surgical procedure? This is why doctors demand cutoffs. They are not morons, they just have a different set of priorities and constraints.

    • @galenseilis5971
      @galenseilis5971 9 днів тому

      "They are not morons, they just have a different set of priorities and constraints." I think that's the crux of it right there. This doesn't excuse poor statistical practices when they occur, but they're just humans trying to solve problems like the rest of us. Mapping continuous random variables to discrete random variables usually makes the most sense 'after' the joint probability distribution over the variables has been well-specified. The discrete outcomes can be the options to be decided upon, hopefully along with probabilities computed via the change of variables for the mapping, so that decisions under risk can be made. Even when we have discrete random variables to begin with, probabilities distributions (exempting Dirac delta distributions) do not usually commit to a single answer anyway. If I look at demand for hospital beds and I want to know how many beds will be enough for the next planning period, I can only allocate one counting number of beds; not the whole ensemble. Making decisions commits us to one mutually exhaustive option over the others, with some risk of it being incorrect.

  • @galenseilis5971
    @galenseilis5971 21 день тому

    It is interesting considering how to model a random variable X that quantifies an extent but also have a variable Y is an indicator variable that gives us whether there was any extent to X at all. If we have labels then we can model X * Y. Without labels I think a mixture could be reasonable. If you have labels for some but not all of the data then that sounds like a missing data problem. There you should consider whether the missingness is MCAR, MAR, or MNAR. If the former two, then model-driven imputation is may be possible. All-the-better if you impute a probability distribution over the missing values rather than filling in just one value.

  • @galenseilis5971
    @galenseilis5971 21 день тому

    Yeah, an "optimal cutoff" requires a well-defined optimization problem. It requires an objective function to be either minimized or maximized. Vaguely pointing at a continuous empirical distribution does not constitute such clarity.

  • @galenseilis5971
    @galenseilis5971 21 день тому

    I could see someone trying to partition the data if they saw a bimodal distribution and no apparent labels to explain that bimodality, but I would still prefer a mixture model. A mixture model allows the assignment of probabilities to the apparent subpopulations.

    • @planetary-rendez-vous
      @planetary-rendez-vous 21 день тому

      Nah I split my continuous variable into 3 categories, low,medium, high because we have no idea how to analyze it. 🤡

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@planetary-rendez-vous lol

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@planetary-rendez-vous In all seriousness, my advice is to ask for help when you don't know how to analyze your data.

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@planetary-rendez-vous Ah, I just noticed you did below. You might get answers in a comments section like this. You can also reach out to statisticians (like Dustin) or others at industry and academic institutions. In my experience they're happy to help when they have time.

    • @planetary-rendez-vous
      @planetary-rendez-vous 20 днів тому

      @@galenseilis5971 Yes of course, however I had 2 lab experiences and both didn't have statisticians. One lab had an epidemiologist and tbf it is not the same as a statistician that understands the theory behind the methods. So the epidemiologist doesn't have any reservations about using categorizations ; we truly didn't know better, and it is very likely that nobody knows better in current environments except specialized ones with dedicated researchers on good statistical practices.

  • @galenseilis5971
    @galenseilis5971 21 день тому

    I think what actually stops me from ever using median splits is that the decisions I help people make with statistics don't involve the median. It just doesn't have any relevance on the problems I work on.

    • @antoniobarros3415
      @antoniobarros3415 20 днів тому

      I came across several PhDs that were based on that particular subject matter.

    • @galenseilis5971
      @galenseilis5971 20 днів тому

      @@antoniobarros3415 Their PhD dissertations depended on using the median? It certainty could happen.

    • @antoniobarros3415
      @antoniobarros3415 20 днів тому

      @@galenseilis5971 sure do. terrible. I cannot however id the PhD!

  • @anasbit2
    @anasbit2 21 день тому

    I am really interested in the continuous along with categorical findings when we have to make a decision, as you said. I would really appreciate it if I could find a paper that demonstrate this approach, or utilize it.

  • @NicholasBerryman-zs8ip
    @NicholasBerryman-zs8ip 21 день тому

    I always liked Fuzzy Logic as a framework for interpreting linguistic categorical concepts continuously and vice versa. 'Fuzzification' and 'Defuzzification' are permanent fixtures in my toolbox for when I need to explain the ideas that you talk about here.

    • @QuantPsych
      @QuantPsych 21 день тому

      I have no idea what that is....I'm starting to rethink my attempt to talk about linguistics :)

    • @galenseilis5971
      @galenseilis5971 21 день тому

      Personally, I do not like fuzzy logic (or fuzzy set theory). I have yet to encounter a problem where I thought it was conceptually better than a probabilistic model. It isn't that "cold", "warm", or "hot" are fuzzy. Rather there are difference probabilities of someone saying any of those given labels given some factors including the physical temperature. Often mixed effects and mixture models explain a lot of variation in such responses.

    • @NicholasBerryman-zs8ip
      @NicholasBerryman-zs8ip 14 днів тому

      @@galenseilis5971 I will agree that mixed effects models often explain 'fuzzy' ideas better that fuzzy logic, I do think you're throwing some baby out with the bath water here. I have often found it a good explanatory approach (not modelling approach) to get people out of thinking in the ways this video criticizes. When people ask me to treat an obviously continuous variable as a categorical one, I've often found it helpful to present some edge cases and say e.g. 'Could we not say that this point is "more positive" than this other point? Perhaps there's some implicit information in your wording of "positive" that the model wouldn't get if we treated a fuzzy thing like language as something perfectly exact', and that often gets the idea across, where I find bringing probabilistic ideas often confuse people not well versed in them. And that's without getting into fields like controls theory, where probabilistic thinking doesn't make sense - it doesn't make sense to say a train has a 50% chance of breaking where we mean it is breaking at 50% power.

    • @galenseilis5971
      @galenseilis5971 12 днів тому

      @@NicholasBerryman-zs8ip I have not seen the light yet, but I'll admit I have not looked into to what extent it helps knowledge translation to talk in terms of fuzzy logic. It could be that even if it is debatable on intellectual grounds, some people may just find fuzzy logic more intuitive for some problems. I find that fuzziness in this ontological sense is nonsensical and doesn't help me understand the systems I study, and hence my own dislike of it as an interpretation of the math being used. Probability theory has applications in control theory, but we don't need to jump into that level of breadth to discuss your example. What I understand from your phrasing is that there is a bounded variable "power" which has a well-defined 50% point between the lowest value and the greatest value, and that there is a variable for whether the system is broken which is a binary state of a system. This binary variable can be formulated as an indicator function for which some of its pre-image corresponds to "broken" and the complement set corresponds to "not broken", which gives us a way of classically labeling the state space (i.e. the partitioning is "sharp" in fuzzy set theory terms). While probability is unnecessary in this example, so is fuzzy logic. There are also cases where the system is not always broken at 50% power, but rather is sometimes broken or sometimes not broken (as labelled in the data). A probability approach would say there is a probability p of the system being broken at 50% power. A fuzzy logic approach would say the system is z broken and simultaneously 1-z not broken. In my point of view the probability interpretation of this generalization of the binary broken/not broken is preferable over the fuzzy logic interpretation.

    • @galenseilis5971
      @galenseilis5971 9 днів тому

      @@NicholasBerryman-zs8ip There may well be upsides to the knowledge translation side of fuzzy logic. It is not something I have looked into.

  • @070279381
    @070279381 21 день тому

    can you leave the link in your description I can not find your website.

  • @gimanibe
    @gimanibe 21 день тому

    First of all you’re kind of crazy, in a good way 😂, secondarily, I will use “languagatized”with my students! Boo to discretizing continuous variables!

    • @galenseilis5971
      @galenseilis5971 21 день тому

      Fife's kind of crazy is a fun type of crazy.

  • @zimmejoc
    @zimmejoc 21 день тому

    The inflection idea to make language continuous is something we already do. Daft and Lengel talk about media richness. Papers are less rich than talking because tone and inflection don’t come through in a paper. That’s a huge simplification of their premise, but that’s the gist.

  • @zimmejoc
    @zimmejoc 21 день тому

    I just don’t get the median split. Why take a continuous variable with all its information and then strip it away and turn it into a discrete one with two values. It’s a loss of data. EDIT: I commented the moment you said median split. Nice to see your elaboration support my dislike of the practice. 😁

  • @billyboy1997
    @billyboy1997 26 днів тому

    I found a new hidden gem channel! Nice video.

  • @naampaccchina
    @naampaccchina 26 днів тому

    PLEASE do videos on analysis in SAS! eg for logitudinal data

  • @naampaccchina
    @naampaccchina 26 днів тому

    you are brilliant!

  • @stephenomenal1245
    @stephenomenal1245 27 днів тому

    I love this guy lol

  • @samuelroytburd1260
    @samuelroytburd1260 27 днів тому

    Appreciate it, thanks!

  • @igorbione4796
    @igorbione4796 27 днів тому

    Oh my, this video would save me a lot of work if I checked earlier! Thanks!

  • @pipertripp
    @pipertripp 28 днів тому

    What about ridge regression and LASSO for feature reduction?

  • @babak0203
    @babak0203 29 днів тому

    so, if the loess is tend to be linear we chose linear model else non-linear. is that wright?

  • @nuary120896
    @nuary120896 Місяць тому

    Super useful, especially in ecology, because I rarely get normal data from my field experiments. And when I do, is usually because something went wrong 😆