Tuesday, December 2, 2014

The quantum revolution

Scientists have developed a better way to run a quantum algorithm. This new method uses much simpler methods than previously algorithms, and could possibly be a major advance in the development of a quantum computer. Read More

Monday, November 10, 2014

How to Pass Your Standardized Examination in Biostatistics: Fear and Reward

Pleasure and fear are primary human motivators. You will need to learn how to use these properly in order to pass your statistics examination.

First, lets take a close look at your fears.

Fear isn't a weakness, but rather when used properly it is a gift. Fear can paralyze you, or it can motivate you to take action. Usually, paralysis comes from when we do not face our fears head-on, and let them simmer in the recesses of our thoughts. So take a good inventory of your fear. In the case of a standardized examination, the fear is obvious: it is the fear of failure.

The fear of failure is very real when it comes to standardized examinations. In order to pass classes, graduate, and become licensed, students need to pass and do well on standardized examinations covering epidemiology and biostatistics. It is an essential part of medical training because the basic concepts do not change over time, and nearly all general standardized examinations contain at least a few questions on epidemiology and biostatistics.

As a practicing physician in the United States, I was required to pass multiple standardized examinations in order to pass my medical school classes, graduate from medical school, get a medical license, and become Board Certified. Now, with the new Maintenance of Certification requirements, I am required to continue to take and pass standardized examinations. Doing well means I can continue to work. Not passing an examination means in many cases a loss of Board Certification and subsequent loss of hospital privileges. And what do all of these examinations have in common? First, they are all structured in a similar manner, and secondly, they all contain questions on epidemiology and biostatistics.

The first step in doing well on these standardized examinations is to really understand at a deep and personal level just how important it is to do well. Yes, it is enjoyable to learn and master epidemiology and statistics, but you must do more. You must be able to apply your learning, and pass your standardized examination.

For medical students, achieving a high score on the USMLE Step 1 examination is critical in the determination of where students will end up doing their residency program. Those who score well will be much more likely to get into their number one residency choice. Those who score poorly and those who fail face a much greater challenge in getting into the specialty of choice, their residency of choice, and even getting into a residency program at all. Understanding at a fundamental, personal level the great importance of this examination helps motivate you to study and prepare properly. Would you like to take Friday night off to go dancing? Maybe that's okay, but maybe you need to study instead. First, think about how important the USMLE examination is, then decide if it is in your best interests to go out or if you should stay in and prepare for the USMLE.

So, we know one thing quite clearly. Failing your standardized examination in statistics will be painful.

The second great motivator is pleasure. In the case of numbers and statistics, it may be difficult to understand at first, but upon really understanding the concepts a great pleasure results. It's the pleasure of learning, the pleasure of knowing, the pleasure of realizing that your hard work ultimately will result in you helping people better and more effectively.

Understanding basic concepts in epidemiology and biostatistics means you won't be fooled by sales reps that want you to use their product. You will know better, and see past their sales pitch. Your mastery of statistics will help you view one of the greatest things in this universe --- the truth.

A great pleasure of learning statistics in a manner that will help you pass your standardized examination comes from passing your test. Passing your test with flying colors means more recognition, more options, and greater control over your future. A high score means more job opportunities, more residency opportunities, more options.

Be smart. Understand the basic motivations of fear and pleasure. Use the fear of failure to motivate you to study more, and study better. Use the pleasure of learning to motivate you to study more, and study better. Most of us respond better to either fear or pleasure. Use both if you can, but at least be sure to tap into your primary motivation source. Use this to light that fire in your belly in order to learn statistics, pass your test, and help your patients.

Proper preparation for the USMLE exam requires that you get started on day #1 of medical school.

Wednesday, October 22, 2014

Primer in statistics part 2

This is the second part of a two part series.

First, a guide in selecting the proper
statistical tests based on the research question will be laid out in
text and with a table so that researchers choose the univariable
statistical test by answering five simple questions.

Second, the
importance of utilizing repeated measures analysis will be illustrated.
This is a key component of data analysis as in many dental studies,
observations are considered repeated in a single patient (several teeth
are measured in a single patient).

Third, concepts of confounding and
the use of regression analysis are explained by going over a famous
observational cohort study.

Lastly, the use of proper agreement analysis
vs. correlation for study of agreement will be discussed to avoid a
common pitfall in dental research.

Primer of statistics in dental research: P... [J Prosthodont Res. 2014] - PubMed - NCBI

Primer in statistics part 1

This series of 2 articles aim to introduce dental researchers to 9 essential topics in statistics to conduct EBD with intuitive examples. The part I of the series includes the first 5 topics

(1) statistical graph

(2) how to deal with outliers

(3) p-value and confidence interval

(4) testing equivalence

(5) multiplicity adjustment.

Part II will follow to cover the remaining topics including

(6) selecting the proper statistical tests

(7) repeated measures analysis

(8) epidemiological consideration for causal association

(9) analysis of agreement.

J Prosthodont Res. 2014 Jan;58(1):11-6

Friday, October 17, 2014

Vikings and Superheroes: building a statistical network

The Icelandic sagas of the Norse people are thousand-year-old chronicles of brave deeds and timeless romances, but how true to Viking life were they? Researchers used a statistical network of associations between characters to find out. Read More

Saturday, March 29, 2014

What drives activity on Pinterest?

Researchers have released a new study that uses statistical data to help understand the motivations behind Pinterest activity, the roles gender plays among users and the factors that distinguish Pinterest from other popular social networking sites. Read More

Sporting events: Clear your memory to pick a winner

Predicting the winner of a sporting event with accuracy close to that of a statistical computer program could be possible with proper training, according to researchers. Read More

Friday, March 21, 2014

Computer analysis of massive clinical databases

A computer program capable of tracking more than 100 clinical variables for almost 400 people has shown it can identify various subtypes of asthma, which perhaps could lead to targeted, more effective treatments. A computational biologist led the analysis of patient data for the study. Read More

Wednesday, March 19, 2014

New statistical models could lead to better predictions of ocean patterns

The world's oceans cover more than 72 percent of the earth's surface, impact a major part of the carbon cycle, and contribute to variability in global climate and weather patterns. Now, researchers at the University of Missouri applied complex statistical models to increase the accuracy of ocean forecasting that influences the ways in which forecasters predict long-range events such as El NiDo and the lower levels of the ocean food chain. Read More

Monday, March 17, 2014

Researchers develop new generation visual browser of epigenome

ChroGPS is a software application that serves to facilitate the analysis and understanding of epigenetic data and to extract intelligible information, which can be downloaded free of charge in Bioconductor, a reference repository for biocomputational software. Scientists describe the uses of the program in a new article. Read More

Optical rogue waves: The storm in a test tube

Random processes in nature often underlie a so-called normal distribution that enables reliable estimation for the appearance of extreme statistical events. Meteorological systems are an exception to this rule, with extreme events appearing at a much higher rate than could be predicted from long-term observation at much lower magnitude. One such example is the appearance of unexpectedly strong storms, yet another are rare reports of waves of extreme height in the ocean, which are also known as rogue waves or monster waves. Read More

Machines learn to detect breast cancer

Software that can recognize patterns in data is commonly used by scientists and economics. Now, researchers in the US have applied similar algorithms to help them more accurately diagnose breast cancer. Read More

New strep throat risk score brings data together to improve care

A new risk measure called a "home score" could save a patient with symptoms of strep throat a trip to the doctor, according to a new paper. The score combines patients' symptoms and demographic information with data on local strep throat activity to estimate their strep risk, empowering them to seek care appropriately. Read More

Reduce unnecessary lab tests, decrease costs by modifying software

When patients undergo diagnostic lab tests as part of the inpatient admission process, they may wonder why or how physicians choose particular tests. Increasingly, medical professionals are using electronic medical systems that provide lists of lab tests from to choose. Now, researchers have studied how to modify these lists to ensure health professionals order relevant tests and omit unnecessary lab tests, which could result in better care and reduced costs. Read More

Thursday, March 13, 2014

Finding the hidden zombie in your network: Statistical approach to unraveling computer botnets

How do you detect a "botnet," a network of computers infected with malware -- so-called zombies -- that allow a third party to take control of those machines? The answer may lie in a statistical tool first published in 1966 and brought into the digital age, say researchers. Read More

How people use Facebook to maintain friendships

New social networking research investigates how individuals use Facebook to maintain their friendships. Read More

Doctors often uncertain in ordering, interpreting lab tests

A survey of primary care physicians suggests they often face uncertainty in ordering and interpreting clinical laboratory tests. Physicians have developed their own strategies for ordering and interpreting lab tests, such as asking a physician colleague or specialist, consulting a text or electronic reference, or calling the laboratory. But physicians reported they would welcome better decision-support software embedded in electronic medical records and direct access to lab personnel through lab hotlines. Read More

Wednesday, March 12, 2014

Frequent cell phone use linked to anxiety, lower grade, reduced happiness in students

Results of the analysis showed that cell phone use by college students was negatively related to GPA and positively related to anxiety. Following this, GPA was positively related to happiness while anxiety was negatively related to happiness. Thus, for the population studied, high frequency cell phone users tended to have lower GPA, higher anxiety, and lower satisfaction with life (happiness) relative to their peers who used the cell phone less often. Read More

Monday, March 10, 2014

Better way to make sense of 'Big Data?'

Big data is everywhere, and we are constantly told that it holds the answers to almost any problem we want to solve. But simply having lots of data is not the same as understanding it. New mathematical tools are needed to extract meaning from enormous data sets. Researchers now challenge the most recent advances in this field, using a classic mathematical concept to tackle the outstanding problems in big data analysis. Read More

What's behind a #1 ranking?

Behind every "Top 100" list are a generous sprinkling of personal bias and subjective decisions. Lacking the tools to calculate how factors like median home prices and crime rates actually affect the "best places to live," the public must take experts' analysis at face value. Read More

Sunday, March 9, 2014

To teach scientific reproducibility, start young

In the wake of retraction scandals and studies showing reproducibility rates as low as 10 percent for peer-reviewed articles, the scientific community has focused attention on ways to improve transparency and duplication. A team of math and statistics professors has proposed a way to address one root of that problem: teach and emphasize reproducibility to aspiring scientists, using software that makes the concept feel logical rather than cumbersome. Read More

Digital ears in the rainforest: Estimating dynamics of animal populations by using sound recordings and computing

A Finnish-Brazilian project is constructing a system that could estimate the dynamics of animal populations by using sound recordings, statistics and scientific computing. The canopy in a Brazilian rainforest is bustling with life, but nothing is visible from the ground level. The digital recorders attached to the trees, however, are picking up the noises of birds. Read More

Social media, self-esteem and suicide: Nations with more corruption demonstrate more social media, less suicide

In nations where corruption is rife, it seems that citizens these days find an escape from the everyday problems that trickle down to their lives by using online social media more than those elsewhere. Research also suggests that these two factors -- more corruption, more social networking -- also correlate with lower suicide rates. Read More

Friday, March 7, 2014

Collecting digital user data without invading privacy

The statistical evaluation of digital user data is of vital importance for analyzing trends. But it can also undermine the privacy. Computer scientists have now developed a novel cryptographic method that makes it possible to collect data and protect the privacy of the user at the same time.Read More

Monday, March 3, 2014

The performance of robust test statistics with categorical data.

The performance of robust test statistics with categorical data.
Br J Math Stat Psychol. 2013 May;66(2):201-23
Authors: Savalei V, Rhemtulla M

This paper reports on a simulation study that evaluated the performance of five structural equation model test statistics appropriate for categorical data. Both Type I error rate and power were investigated. Different model sizes, sample sizes, numbers of categories, and threshold distributions were considered. Statistics associated with both the diagonally weighted least squares (cat-DWLS) estimator and with the unweighted least squares (cat-ULS) estimator were studied. Recent research suggests that cat-ULS parameter estimates and robust standard errors slightly outperform cat-DWLS estimates and robust standard errors (Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009). The findings of the present research suggest that the mean- and variance-adjusted test statistic associated with the cat-ULS estimator performs best overall. A new version of this statistic now exists that does not require a degrees-of-freedom adjustment (Asparouhov & Muthén, 2010), and this statistic is recommended. Overall, the cat-ULS estimator is recommended over cat-DWLS, particularly in small to medium sample sizes.

Read More

Wednesday, February 26, 2014

A review of standards and statistics used to describe blood glucose monitor performance.

A review of standards and statistics used to describe blood glucose monitor performance.
J Diabetes Sci Technol. 2010 Jan;4(1):75-83
Authors: Krouwer JS, Cembrowski GS

Glucose performance is reviewed in the context of total error, which includes error from all sources, not just analytical. Many standards require less than 100% of results to be within specific tolerance limits. Analytical error represents the difference between tested glucose and reference method glucose. Medical errors include analytical errors whose magnitude is great enough to likely result in patient harm. The 95% requirements of International Organization for Standardization 15197 and others make little sense, as up to 5% of results can be medically unacceptable. The current American Diabetes Association standard lacks a specification for user error. Error grids can meaningfully specify allowable glucose error. Infrequently, glucose meters do not provide a glucose result; such an occurrence can be devastating when associated with a life-threatening event. Nonreporting failures are ignored by standards. Estimates of analytical error can be classified into the four following categories: imprecision, random patient interferences, protocol-independent bias, and protocol-dependent bias. Methods to estimate total error are parametric, nonparametric, modeling, or direct. The Westgard method underestimates total error by failing to account for random patient interferences. Lawton's method is a more complete model. Bland-Altman, mountain plots, and error grids are direct methods and are easier to use as they do not require modeling. Three types of protocols can be used to estimate glucose errors: method comparison, special studies and risk management, and monitoring performance of meters in the field. Current standards for glucose meter performance are inadequate. The level of performance required in regulatory standards should be based on clinical needs but can only deal with currently achievable performance. Clinical standards state what is needed, whether it can be achieved or not. Rational regulatory decisions about glucose monitors should be based on robust statistical analyses of performance.<! br/>
Read More

Sunday, February 23, 2014

A brief introduction to computer-intensive methods, with a view towards applications in spatial statistics and stereology.

A brief introduction to computer-intensive methods, with a view towards applications in spatial statistics and stereology.
J Microsc. 2011 Apr;242(1):1-9
Authors: Mattfeldt T

Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods.

Read More

Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology.

Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology.
J Nutr Biochem. 2010 Jul;21(7):561-72
Authors: Fu WJ, Stromberg AJ, Viele K, Carroll RJ, Wu G

Over the past 2 decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (Type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine growth retardation).

Read More

Saturday, February 22, 2014

Statistics in experimental cerebrovascular research: comparison of more than two groups with a continuous outcome variable.

Statistics in experimental cerebrovascular research: comparison of more than two groups with a continuous outcome variable.
J Cereb Blood Flow Metab. 2010 Sep;30(9):1558-63
Authors: Schlattmann P, Dirnagl U

A common setting in experimental cerebrovascular research is the comparison of more than two experimental groups. Often, continuous measures such as infarct volume, cerebral blood flow, or vessel diameter are the primary variables of interest. This article presents the principles of the statistical analysis of comparing more than two groups using analysis of variance (ANOVA). We will also explain post hoc comparisons, which are required to show which groups significantly differ once ANOVA has rejected the null hypothesis. Although statistical packages perform ANOVA and post hoc contrast at a key stroke, in this study, we use examples from experimental stroke research to reveal the simple math behind the calculations and the basic principles. This will enable the reader to understand and correctly interpret the readout of statistical packages and to help prevent common errors in the comparison of multiple means.

Read More

Biostatistics primer: what a clinician ought to know: subgroup analyses.

Biostatistics primer: what a clinician ought to know: subgroup analyses.
J Thorac Oncol. 2010 May;5(5):741-6
Authors: Barraclough H, Govindan R

Large randomized phase III prospective studies continue to redefine the standard of therapy in medical practice. Often when studies do not meet the primary endpoint, it is common to explore possible benefits in specific subgroups of patients. In addition, these analyses may also be done, even in the case of a positive trial to find subsets of patients where the therapy is especially effective or ineffective. These unplanned subgroup analyses are justified to maximize the information that can be obtained from a study and to generate new hypotheses. Unfortunately, however, they are too often over-interpreted or misused in the hope of resurrecting a failed study. It is important to distinguish these overinterpreted, misused, and unplanned subgroup analyses from those prespecified and well-designed subgroup analyses. This overview provides a practical guide to the interpretation of subgroup analyses.

Read More

Statistics in medicine.

Statistics in medicine.
Plast Reconstr Surg. 2011 Jan;127(1):437-44
Authors: Januszyk M, Gurtner GC

The scope of biomedical research has expanded rapidly during the past several decades, and statistical analysis has become increasingly necessary to understand the meaning of large and diverse quantities of raw data. As such, a familiarity with this lexicon is essential for critical appraisal of medical literature. This article attempts to provide a practical overview of medical statistics, with an emphasis on the selection, application, and interpretation of specific tests. This includes a brief review of statistical theory and its nomenclature, particularly with regard to the classification of variables. A discussion of descriptive methods for data presentation is then provided, followed by an overview of statistical inference and significance analysis, and detailed treatment of specific statistical tests and guidelines for their interpretation.

Read More

Online sources of health statistics in Saudi Arabia.

Online sources of health statistics in Saudi Arabia.
Saudi Med J. 2011 Jan;32(1):9-14
Authors: Al-Zalabani AH

Researchers looking for health statistics on the Kingdom of Saudi Arabia (KSA) may face difficulty. This is partly due to the lack of awareness of potential sources where such statistics can be found. The purpose of this paper is to review various online sources of health statistics on KSA, and to highlight their content, coverage, and presentation of health statistics. Five bibliographic databases where local research can be found are described. National registries available are summarized. Governmental agencies, as well as societies and centers where the bulk of health statistics is produced are also described. Finally, some potential international sources that can be used for the purpose of comparison are presented.

Read More

Wednesday, February 19, 2014

Low-dose steroids for septic shock and severe sepsis: the use of Bayesian statistics to resolve clinical trial controversies.

Low-dose steroids for septic shock and severe sepsis: the use of Bayesian statistics to resolve clinical trial controversies.
Intensive Care Med. 2011 Mar;37(3):420-9
Authors: Kalil AC, Sun J

PURPOSE: Low-dose steroids have shown contradictory results in trials and three recent meta-analyses. We aimed to assess the efficacy and safety of low-dose steroids for severe sepsis and septic shock by Bayesian methodology.
METHODS: Randomized trials from three published meta-analyses were reviewed and entered in both classic and Bayesian databases to estimate relative risk reduction (RRR) for 28-day mortality, and relative risk increase (RRI) for shock reversal and side effects.
RESULTS: In septic shock trials only (Marik meta-analysis; N = 965), the probability that low-dose steroids decrease mortality by more than 15% (i.e., RRR > 15%) was 0.41 (0.24 for RRR > 20% and 0.14 for RRR > 25%). For severe sepsis and septic shock trials combined, the results were as follows: (1) for the Annane meta-analysis (N = 1,228), the probabilities were 0.57 (RRR > 15%), 0.32 (RRR > 20%), and 0.13 (RRR > 25%); (2) for the Minneci meta-analysis (N = 1,171), the probability was 0.57 to achieve mortality RRR > 15%, 0.32 (RRR > 20%), and 0.14 (RRR > 25%). The removal of the Sprung trial from each analysis did not change the overall results. The probability of achieving shock reversal ranged from 65 to 92%. The probability of developing steroid-induced side effects was as follows: for gastrointestinal bleeding (N = 924), there was a 0.73 probability of steroids causing an RRI > 1%, 0.70 for RRI > 2%, and 0.67 for RRI > 5%; for superinfections (N = 964), probabilities were 0.81 (RRI > 1%), 0.76 (RRI > 2%), and 0.70 (RRI > 5%); and for hyperglycemia (N = 540), 0.99 (RRI > 1%), 0.97 (RRI > 2%), and 0.94 (RRI > 5%).
CONCLUSIONS: Based on clinically meaningful thresholds (RRR > 15-25%) for mortality reduction in severe sepsis or septic shock, the Bayesian approach to all three meta-analyses consistently showed that low-dose steroids were not associated with survival benefits. The probabilities of developing steroid-induced side effects (superinfections, bleeding, and hyperglycemia) were high for all analyses.

Read More

How to grow a mind: statistics, structure, and abstraction.

How to grow a mind: statistics, structure, and abstraction.
Science. 2011 Mar 11;331(6022):1279-85
Authors: Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND

In coming to understand the world-in learning concepts, acquiring language, and grasping causal relations-our minds make inferences that appear to go far beyond the data available. How do we do it? This review describes recent approaches to reverse-engineering human learning and cognitive development and, in parallel, engineering more humanlike machine learning systems. Computational models that perform probabilistic inference over hierarchies of flexibly structured representations can address some of the deepest questions about the nature and origins of human thought: How does abstract knowledge guide learning and reasoning from sparse data? What forms does our knowledge take, across different domains and tasks? And how is that abstract knowledge itself acquired?

Read More

Statistics and truth in phylogenomics.

Statistics and truth in phylogenomics.
Mol Biol Evol. 2012 Feb;29(2):457-72
Authors: Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K

Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.

Read More

A statistics primer.

A statistics primer.
J Small Anim Pract. 2011 Sep;52(9):456-8
Authors: Scott M, Flaherty D, Currall J

Statistical input into an experimental study is often not considered until the results have already been obtained. This is unfortunate, as inadequate statistical planning 'up front' may result in conclusions which are invalid. This review will consider some of the statistical considerations that are appropriate when planning a research study.

Read More

Monday, February 17, 2014

Why we should let "evidence-based medicine" rest in peace.

Clin Dermatol. 2013 Nov-Dec;31(6):806-10

Evidence-based medicine is a redundant term to the extent that doctors have always claimed they practiced medicine on the basis of evidence. They have, however, disagreed about what exactly constitutes legitimate evidence and how to synthesize the totality of evidence in a way that supports clinical action. Despite claims to the contrary, little progress has been made in solving this hard problem in any sort of formal way. The reification of randomized clinical trials (RCTs) and the tight linkage of such evidence to the development of clinical guidelines have led to error. In part, this relates to statistical and funding issues, but it also reflects the fact that the clinical events that comprise RCTs are not isomorphic with most clinical practice. Two possible and partial solutions are proposed: (1) to test empirically in new patient populations whether guidelines have the desired effects and (2) to accept that a distributed ecosystem of opinion rather than a hierarchical or consensus model of truth might better underwrite good clinical practice.

Read More

Sunday, February 16, 2014

The reliability of suicide statistics: a systematic review.

The reliability of suicide statistics: a systematic review.
BMC Psychiatry. 2012;12:9
Authors: Tøllefsen IM, Hem E, Ekeberg Ø

BACKGROUND: Reliable suicide statistics are a prerequisite for suicide monitoring and prevention. The aim of this study was to assess the reliability of suicide statistics through a systematic review of the international literature.
METHODS: We searched for relevant publications in EMBASE, Ovid Medline, PubMed, PsycINFO and the Cochrane Library up to October 2010. In addition, we screened related studies and reference lists of identified studies. We included studies published in English, German, French, Spanish, Norwegian, Swedish and Danish that assessed the reliability of suicide statistics. We excluded case reports, editorials, letters, comments, abstracts and statistical analyses. All three authors independently screened the abstracts, and then the relevant full-text articles. Disagreements were resolved through consensus.
RESULTS: The primary search yielded 127 potential studies, of which 31 studies met the inclusion criteria and were included in the final review. The included studies were published between 1963 and 2009. Twenty were from Europe, seven from North America, two from Asia and two from Oceania. The manner of death had been re-evaluated in 23 studies (40-3,993 cases), and there were six registry studies (195-17,412 cases) and two combined registry and re-evaluation studies. The study conclusions varied, from findings of fairly reliable to poor suicide statistics. Thirteen studies reported fairly reliable suicide statistics or under-reporting of 0-10%. Of the 31 studies during the 46-year period, 52% found more than 10% under-reporting, and 39% found more than 30% under-reporting or poor suicide statistics. Eleven studies reassessed a nationwide representative sample, although these samples were limited to suicide within subgroups. Only two studies compared data from two countries.
CONCLUSIONS: The main finding was that there is a lack of systematic assessment of the reliability of suicide statistics. Few studies have been done, and few countries have been covered. The findings support the general under-reporting of suicide. In particular, nationwide studies and comparisons between countries are lacking.

Read More

Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians.

Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians.
Proteomics. 2012 Feb;12(4-5):543-9
Authors: van Iterson M, van Haagen HH, Goeman JJ

Bioinformatics is the field where computational methods from various domains have come together for analysis of biological data. Each domain has introduced its own specific jargon. However, in closely related domains, e.g. machine learning and statistics, concordant and discordant terminology occurs, the later can lead to confusion. This article aims to help solve the confusion of tongues arising from these two closely related domains, which are frequently used in bioinformatics. We provide a short summary of the most commonly applied machine learning and statistical approaches to data analysis in bioinformatics, i.e. classification and statistical hypothesis testing. We explain differences and similarities in common terminology used in various domains, such as precision, recall, sensitivity and true positive rate. This primer can serve as a guide to the terminology used in these fields.

Read More

Applications of statistics to medical science (1) Fundamental concepts.

Applications of statistics to medical science (1) Fundamental concepts.
J Nippon Med Sch. 2011;78(5):274-9
Authors: Watanabe H

The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.

Read More

Philosophy and the practice of Bayesian statistics.

Philosophy and the practice of Bayesian statistics.
Br J Math Stat Psychol. 2013 Feb;66(1):8-38
Authors: Gelman A, Shalizi CR

A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

Read More

"Statistics 101"--a primer for the genetics of complex human disease.

"Statistics 101"--a primer for the genetics of complex human disease.
Cold Spring Harb Protoc. 2011 Oct;2011(10):1190-9
Authors: Sinsheimer J

This article reviews the basis of probability and statistics used in the genetic analysis of complex human diseases and illustrates their use in several simple examples. Much of the material presented here is so fundamental to statistics that it has become common knowledge in the field and the originators are no longer cited (e.g., Gauss).

Read More

Friday, February 14, 2014

Statistics for the nonstatistician: Part I.

Statistics for the nonstatistician: Part I.

South Med J. 2012 Mar;105(3):126-30

Authors: Wissing DR, Timm D

Clinical research typically gathers sample data to make an inference about a population. Sample data carries the risk of introducing variation into the data, which can be estimated by the standard error of the mean. Data are described using descriptive statistics such as mean, median, mode, and standard deviation. The strength of the relation between two groups of data can be described using correlation. Hypothesis testing allows the researcher to accept or reject a null hypothesis by calculating the probability that differences between groups are the result of chance. By convention, if the probability is less than .05, the difference between the groups is said to be statistically significant. This probability is determined by statistical tests. Of these groups of tests, the Student t test and the analysis of variance are the more common parametric tests, and the chi-square test is common for nonparametric tests. This article provides a basic overview of biostatistics to assist the nonstatistician with interpreting statistical analyses in research articles.


Read More

Thursday, February 13, 2014

Statistics for the nonstatistician: Part II.

Statistics for the nonstatistician: Part II.

South Med J. 2012 Mar;105(3):131-5

Authors: Hou W, Carden D

Part I of this two-part article provides a foundation of statistical terms and analyses for clinicians who are not statisticians. Types of data, how data are distributed and described, hypothesis testing, statistical significance, sample size determination, and the statistical analysis of interval scale (numeric) data were reviewed. Some data are presented not as interval data, but as named categories, also called nominal or categorical data. Part II reviews statistical tests and terms that are used when analyzing nominal data, data that do not resemble a normal, bell-shaped curve when plotted on the x- and y-axes, linear and logistic regression analysis, and survival analyses. A comprehensive algorithm of appropriate statistical analysis determined by the type, number, and distribution of collected variables also is provided.


Read More

Wednesday, February 12, 2014

Applications of statistics to medical science, II overview of statistical procedures for general use.

Applications of statistics to medical science, II overview of statistical procedures for general use.

J Nippon Med Sch. 2012;79(1):31-6

Authors: Watanabe H

Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.


Read More

Probability, statistics, and computational science.

Probability, statistics, and computational science.

Methods Mol Biol. 2012;855:77-110

Authors: Beerenwinkel N, Siebourg J

In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.


Read More

Studies using English administrative data (Hospital Episode Statistics) to assess health-care outcomes--systematic review and recommendations for reporting.

Studies using English administrative data (Hospital Episode Statistics) to assess health-care outcomes--systematic review and recommendations for reporting.

Eur J Public Health. 2013 Feb;23(1):86-92

Authors: Sinha S, Peach G, Poloniecki JD, Thompson MM, Holt PJ

BACKGROUND: Studies using English administrative data from the Hospital Episode Statistics (HES) are increasingly used for the assessment of health-care quality. This study aims to catalogue the published body of studies using HES data to assess health-care outcomes, to assess their methodological qualities and to determine if reporting recommendations can be formulated.
METHODS: Systematic searches of the EMBASE, Medline and Cochrane databases were performed using defined search terms. Included studies were those that described the use of HES data extracts to assess health-care outcomes.
RESULTS: A total of 148 studies were included. The majority of published studies were on surgical specialties (60.8%), and the most common analytic theme was of inequalities and variations in treatment or outcome (27%). The volume of published studies has increased with time (r = 0.82, P < 0.0001), as has the length of study period (r = 0.76, P < 0.001) and the number of outcomes assessed per study (r = 0.72, P = 0.0023). Age (80%) and gender (57.4%) were the most commonly used factors in risk adjustment, and regression modelling was used most commonly (65.2%) to adjust for confounders. Generic methodologic data were better reported than those specific to HES data extraction. For the majority of parameters, there were no improvements with time.
CONCLUSIONS: Studies published using HES data to report health-care outcomes have increased in volume, scope and complexity with time. However, persisting deficiencies related to both generic and context-specific reporting have been identified. Recommendations have been made to improve these aspects as it is likely that the role of these studies in assessing health care, benchmarking practice and planning service delivery will continue to increase.


Read More

U-statistics in genetic association studies.

U-statistics in genetic association studies.

Hum Genet. 2012 Sep;131(9):1395-401

Authors: Li H

Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.


Read More

Monday, February 10, 2014

Applications of statistics to medical science, III. Correlation and regression.

Applications of statistics to medical science, III. Correlation and regression.

J Nippon Med Sch. 2012;79(2):115-20

Authors: Watanabe H

In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.


Read More

The CKD enigma with misleading statistics and myths about CKD, and conflicting ESRD and death rates in the literature: results of a 2008 U.S. population-based cross-sectional CKD outcomes analysis.

Ren Fail. 2013;35(3):338-43. Authors: Onuigbo MA

The just released (August 2012) U.S. Preventive Services Task Force (USPSTF) report on chronic kidney disease (CKD) screening concluded that we know surprisingly little about whether screening adults with no signs or symptoms of CKD will improve health outcomes and that clinicians and patients deserve better information on CKD. The implications of the recently introduced CKD staging paradigm versus long-term renal outcomes remain uncertain. Furthermore, the natural history of CKD remains unclear. We completed a comparison of US population-wide CKD to projected annual incidence of end stage renal disease (ESRD) for 2008 based on current evidence in the literature . Projections for new ESRD resulted in an estimated 840,000 new ESRD cases in 2008, whereas the actual reported new ESRD incidence in 2008, according to the 2010 USRDS Annual Data Report, was in fact only 112,476, a gross overestimation by about 650%. We conclude that we as nephrologists in particular, and physicians in general, still do not understand the true natural history of CKD. We further discussed the limitations of current National Kidney Foundation Disease Outcomes Quality Initiative (NKF KDOQI) CKD staging paradigms. Moreover, we have raised questions regarding the CKD patients who need to be seen by nephrologists, and have further highlighted the limitations and intricacies of the individual patient prognostication among CKD populations when followed overtime, and the implications of these in relation to future planning of CKD care in general. Finally, the clear heterogeneity of the so-called CKD patient is brought into prominence as we review the very misleading concept of classifying and prognosticating all CKD patients as one homogenous patient population.

... Read More

A primer for clinical researchers in the emergency department: Part V: How to describe data and basic medical statistics.

Emerg Med Australas. 2013 Feb;25(1):13-21. Authors: Donath S, Davidson A, Babl FE

In this series we address key topics for clinicians who conduct research as part of their work in the ED. In this section we will address important statistical concepts for clinical researchers and readers of clinical research publications. We use practical clinical examples of how to describe clinical data for presentation and publication, and explain key statistical concepts and tests clinical researchers will likely use for the majority of ED datasets.

... Read More

Statistics: dealing with categorical data.

J Small Anim Pract. 2013 Jan;54(1):3-8 Authors: Scott M, Flaherty D, Currall J

This, the fifth of our series of articles on statistics in veterinary medicine, moves onto modelling categorical data, in particular assessing associations between variables. Some of the questions we shall consider are widely discussed in many clinical research publications, and we will use the ideas of hypothesis tests and confidence intervals to answer those questions.

... Read More

Applications of statistics to medical science, IV survival analysis.

J Nippon Med Sch. 2012;79(3):176-81 Authors: Watanabe H
The fundamental principles of survival analysis are reviewed. In particular, the Kaplan-Meier method and a proportional hazard model are discussed. This work is the last part of a series in which medical statistics are surveyed.

... Read More

Statistics: how many?

J Small Anim Pract. 2012 Jul;53(7):372-6Authors: Scott M, Flaherty D, Currall J

The fourth in our series of articles on statistics for clinicians focuses on how we determine the appropriate number of subjects to include in an experimental study to provide sufficient statistical "power".

... Read More

Sunday, February 9, 2014

Statistics: are we related?

Related Articles

Statistics: are we related?

J Small Anim Pract. 2013 Mar;54(3):124-8

Authors: Scott M, Flaherty D, Currall J

This short addition to our series on clinical statistics concerns relationships, and answering questions such as "are blood pressure and weight related?" In a later article, we will answer the more interesting question about how they might be related. This article follows on logically from the previous one dealing with categorical data, the major difference being here that we will consider two continuous variables, which naturally leads to the use of a Pearson correlation or occasionally to a Spearman rank correlation coefficient.

PMID: 23458641 [PubMed - indexed for MEDLINE]

... Read More

Thursday, February 6, 2014

Statistics: using regression models.

Related Articles

Statistics: using regression models.

J Small Anim Pract. 2013 Jun;54(6):285-90

Authors: Scott M, Flaherty D, Currall J

In a previous article, we asked the simple question "Are we related?" and used scatterplots and correlation coefficients to provide an answer. In this article, we will take this question and reword it to "How are we related?" and will demonstrate the statistical techniques required to reach a conclusion.

PMID: 23656306 [PubMed - indexed for MEDLINE]

... Read More

Sunday, February 2, 2014

Clinical statistics: five key statistical concepts for clinicians.

Related Articles

Clinical statistics: five key statistical concepts for clinicians.

J Korean Assoc Oral Maxillofac Surg. 2013 Oct;39(5):203-206

Authors: Choi YG

Statistics is the science of data. As the foundation of scientific knowledge, data refers to evidentiary facts from the nature of reality by human action, observation, or experiment. Clinicians should be aware of the conditions of good data to support the validity of clinical modalities in reading scientific articles, one of the resources to revise or update their clinical knowledge and skills. The cause-effect link between clinical modality and outcome is ascertained as pattern statistic. The uniformity of nature guarantees the recurrence of data as the basic scientific evidence. Variation statistics are examined for patterns of recurrence. This provides information on the probability of recurrence of the cause-effect phenomenon. Multiple causal factors of natural phenomenon need a counterproof of absence in terms of the control group. A pattern of relation between a causal factor and an effect becomes recognizable, and thus, should be estimated as relation statistic. The type and meaning of each relation statistic should be well-understood. A study regarding a sample from the population of wide variations require clinicians to be aware of error statistics due to random chance. Incomplete human sense, coarse measurement instrument, and preconceived idea as a hypothesis that tends to bias the research, which gives rise to the necessity of keen critical independent mind with regard to the reported data.

PMID: 24471046 [PubMed - as supplied by publisher]

... Read More