Tuesday, December 2, 2014
The quantum revolution
Monday, November 10, 2014
How to Pass Your Standardized Examination in Biostatistics: Fear and Reward
First, lets take a close look at your fears.
Fear isn't a weakness, but rather when used properly it is a gift. Fear can paralyze you, or it can motivate you to take action. Usually, paralysis comes from when we do not face our fears headon, and let them simmer in the recesses of our thoughts. So take a good inventory of your fear. In the case of a standardized examination, the fear is obvious: it is the fear of failure.
The fear of failure is very real when it comes to standardized examinations. In order to pass classes, graduate, and become licensed, students need to pass and do well on standardized examinations covering epidemiology and biostatistics. It is an essential part of medical training because the basic concepts do not change over time, and nearly all general standardized examinations contain at least a few questions on epidemiology and biostatistics.
As a practicing physician in the United States, I was required to pass multiple standardized examinations in order to pass my medical school classes, graduate from medical school, get a medical license, and become Board Certified. Now, with the new Maintenance of Certification requirements, I am required to continue to take and pass standardized examinations. Doing well means I can continue to work. Not passing an examination means in many cases a loss of Board Certification and subsequent loss of hospital privileges. And what do all of these examinations have in common? First, they are all structured in a similar manner, and secondly, they all contain questions on epidemiology and biostatistics.
The first step in doing well on these standardized examinations is to really understand at a deep and personal level just how important it is to do well. Yes, it is enjoyable to learn and master epidemiology and statistics, but you must do more. You must be able to apply your learning, and pass your standardized examination.
For medical students, achieving a high score on the USMLE Step 1 examination is critical in the determination of where students will end up doing their residency program. Those who score well will be much more likely to get into their number one residency choice. Those who score poorly and those who fail face a much greater challenge in getting into the specialty of choice, their residency of choice, and even getting into a residency program at all. Understanding at a fundamental, personal level the great importance of this examination helps motivate you to study and prepare properly. Would you like to take Friday night off to go dancing? Maybe that's okay, but maybe you need to study instead. First, think about how important the USMLE examination is, then decide if it is in your best interests to go out or if you should stay in and prepare for the USMLE.
So, we know one thing quite clearly. Failing your standardized examination in statistics will be painful.
The second great motivator is pleasure. In the case of numbers and statistics, it may be difficult to understand at first, but upon really understanding the concepts a great pleasure results. It's the pleasure of learning, the pleasure of knowing, the pleasure of realizing that your hard work ultimately will result in you helping people better and more effectively.
Understanding basic concepts in epidemiology and biostatistics means you won't be fooled by sales reps that want you to use their product. You will know better, and see past their sales pitch. Your mastery of statistics will help you view one of the greatest things in this universe  the truth.
A great pleasure of learning statistics in a manner that will help you pass your standardized examination comes from passing your test. Passing your test with flying colors means more recognition, more options, and greater control over your future. A high score means more job opportunities, more residency opportunities, more options.
Be smart. Understand the basic motivations of fear and pleasure. Use the fear of failure to motivate you to study more, and study better. Use the pleasure of learning to motivate you to study more, and study better. Most of us respond better to either fear or pleasure. Use both if you can, but at least be sure to tap into your primary motivation source. Use this to light that fire in your belly in order to learn statistics, pass your test, and help your patients.
Proper preparation for the USMLE exam requires that you get started on day #1 of medical school.
Wednesday, October 22, 2014
Primer in statistics part 2
First, a guide in selecting the proper
statistical tests based on the research question will be laid out in
text and with a table so that researchers choose the univariable
statistical test by answering five simple questions.
Second, the
importance of utilizing repeated measures analysis will be illustrated.
This is a key component of data analysis as in many dental studies,
observations are considered repeated in a single patient (several teeth
are measured in a single patient).
Third, concepts of confounding and
the use of regression analysis are explained by going over a famous
observational cohort study.
Lastly, the use of proper agreement analysis
vs. correlation for study of agreement will be discussed to avoid a
common pitfall in dental research.
Primer of statistics in dental research: P... [J Prosthodont Res. 2014]  PubMed  NCBI
Primer in statistics part 1
(1) statistical graph
(2) how to deal with outliers
(3) pvalue and confidence interval
(4) testing equivalence
(5) multiplicity adjustment.
Part II will follow to cover the remaining topics including
(6) selecting the proper statistical tests
(7) repeated measures analysis
(8) epidemiological consideration for causal association
(9) analysis of agreement.
J Prosthodont Res. 2014 Jan;58(1):116
Friday, October 17, 2014
Vikings and Superheroes: building a statistical network
Saturday, March 29, 2014
What drives activity on Pinterest?
Sporting events: Clear your memory to pick a winner
Friday, March 21, 2014
Computer analysis of massive clinical databases
Wednesday, March 19, 2014
New statistical models could lead to better predictions of ocean patterns
Monday, March 17, 2014
Researchers develop new generation visual browser of epigenome
Optical rogue waves: The storm in a test tube
Machines learn to detect breast cancer
New strep throat risk score brings data together to improve care
Reduce unnecessary lab tests, decrease costs by modifying software
Thursday, March 13, 2014
Finding the hidden zombie in your network: Statistical approach to unraveling computer botnets
How people use Facebook to maintain friendships
Doctors often uncertain in ordering, interpreting lab tests
Wednesday, March 12, 2014
Frequent cell phone use linked to anxiety, lower grade, reduced happiness in students
Monday, March 10, 2014
Better way to make sense of 'Big Data?'
What's behind a #1 ranking?
Sunday, March 9, 2014
To teach scientific reproducibility, start young
Digital ears in the rainforest: Estimating dynamics of animal populations by using sound recordings and computing
Social media, selfesteem and suicide: Nations with more corruption demonstrate more social media, less suicide
Friday, March 7, 2014
Collecting digital user data without invading privacy
Monday, March 3, 2014
The performance of robust test statistics with categorical data.
This paper reports on a simulation study that evaluated the performance of five structural equation model test statistics appropriate for categorical data. Both Type I error rate and power were investigated. Different model sizes, sample sizes, numbers of categories, and threshold distributions were considered. Statistics associated with both the diagonally weighted least squares (catDWLS) estimator and with the unweighted least squares (catULS) estimator were studied. Recent research suggests that catULS parameter estimates and robust standard errors slightly outperform catDWLS estimates and robust standard errors (Forero, MaydeuOlivares, & GallardoPujol, 2009). The findings of the present research suggest that the mean and varianceadjusted test statistic associated with the catULS estimator performs best overall. A new version of this statistic now exists that does not require a degreesoffreedom adjustment (Asparouhov & MuthÃ©n, 2010), and this statistic is recommended. Overall, the catULS estimator is recommended over catDWLS, particularly in small to medium sample sizes.
22568535
Read More
Wednesday, February 26, 2014
A review of standards and statistics used to describe blood glucose monitor performance.
Glucose performance is reviewed in the context of total error, which includes error from all sources, not just analytical. Many standards require less than 100% of results to be within specific tolerance limits. Analytical error represents the difference between tested glucose and reference method glucose. Medical errors include analytical errors whose magnitude is great enough to likely result in patient harm. The 95% requirements of International Organization for Standardization 15197 and others make little sense, as up to 5% of results can be medically unacceptable. The current American Diabetes Association standard lacks a specification for user error. Error grids can meaningfully specify allowable glucose error. Infrequently, glucose meters do not provide a glucose result; such an occurrence can be devastating when associated with a lifethreatening event. Nonreporting failures are ignored by standards. Estimates of analytical error can be classified into the four following categories: imprecision, random patient interferences, protocolindependent bias, and protocoldependent bias. Methods to estimate total error are parametric, nonparametric, modeling, or direct. The Westgard method underestimates total error by failing to account for random patient interferences. Lawton's method is a more complete model. BlandAltman, mountain plots, and error grids are direct methods and are easier to use as they do not require modeling. Three types of protocols can be used to estimate glucose errors: method comparison, special studies and risk management, and monitoring performance of meters in the field. Current standards for glucose meter performance are inadequate. The level of performance required in regulatory standards should be based on clinical needs but can only deal with currently achievable performance. Clinical standards state what is needed, whether it can be achieved or not. Rational regulatory decisions about glucose monitors should be based on robust statistical analyses of performance.<! br/>
20167170
Read More
Sunday, February 23, 2014
A brief introduction to computerintensive methods, with a view towards applications in spatial statistics and stereology.
Computerintensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the MetropolisHastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computerintensive methods.
21118243
Read More
Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology.
Over the past 2 decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, lowmolecularweight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (Type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, realtime polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutritionassociated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine growth retardation).
20233650
Read More
Saturday, February 22, 2014
Statistics in experimental cerebrovascular research: comparison of more than two groups with a continuous outcome variable.
A common setting in experimental cerebrovascular research is the comparison of more than two experimental groups. Often, continuous measures such as infarct volume, cerebral blood flow, or vessel diameter are the primary variables of interest. This article presents the principles of the statistical analysis of comparing more than two groups using analysis of variance (ANOVA). We will also explain post hoc comparisons, which are required to show which groups significantly differ once ANOVA has rejected the null hypothesis. Although statistical packages perform ANOVA and post hoc contrast at a key stroke, in this study, we use examples from experimental stroke research to reveal the simple math behind the calculations and the basic principles. This will enable the reader to understand and correctly interpret the readout of statistical packages and to help prevent common errors in the comparison of multiple means.
20571520
Read More
Biostatistics primer: what a clinician ought to know: subgroup analyses.
Large randomized phase III prospective studies continue to redefine the standard of therapy in medical practice. Often when studies do not meet the primary endpoint, it is common to explore possible benefits in specific subgroups of patients. In addition, these analyses may also be done, even in the case of a positive trial to find subsets of patients where the therapy is especially effective or ineffective. These unplanned subgroup analyses are justified to maximize the information that can be obtained from a study and to generate new hypotheses. Unfortunately, however, they are too often overinterpreted or misused in the hope of resurrecting a failed study. It is important to distinguish these overinterpreted, misused, and unplanned subgroup analyses from those prespecified and welldesigned subgroup analyses. This overview provides a practical guide to the interpretation of subgroup analyses.
20421767
Read More
Statistics in medicine.
The scope of biomedical research has expanded rapidly during the past several decades, and statistical analysis has become increasingly necessary to understand the meaning of large and diverse quantities of raw data. As such, a familiarity with this lexicon is essential for critical appraisal of medical literature. This article attempts to provide a practical overview of medical statistics, with an emphasis on the selection, application, and interpretation of specific tests. This includes a brief review of statistical theory and its nomenclature, particularly with regard to the classification of variables. A discussion of descriptive methods for data presentation is then provided, followed by an overview of statistical inference and significance analysis, and detailed treatment of specific statistical tests and guidelines for their interpretation.
21200241
Read More
Online sources of health statistics in Saudi Arabia.
Researchers looking for health statistics on the Kingdom of Saudi Arabia (KSA) may face difficulty. This is partly due to the lack of awareness of potential sources where such statistics can be found. The purpose of this paper is to review various online sources of health statistics on KSA, and to highlight their content, coverage, and presentation of health statistics. Five bibliographic databases where local research can be found are described. National registries available are summarized. Governmental agencies, as well as societies and centers where the bulk of health statistics is produced are also described. Finally, some potential international sources that can be used for the purpose of comparison are presented.
21212909
Read More
Wednesday, February 19, 2014
Lowdose steroids for septic shock and severe sepsis: the use of Bayesian statistics to resolve clinical trial controversies.
PURPOSE: Lowdose steroids have shown contradictory results in trials and three recent metaanalyses. We aimed to assess the efficacy and safety of lowdose steroids for severe sepsis and septic shock by Bayesian methodology.
METHODS: Randomized trials from three published metaanalyses were reviewed and entered in both classic and Bayesian databases to estimate relative risk reduction (RRR) for 28day mortality, and relative risk increase (RRI) for shock reversal and side effects.
RESULTS: In septic shock trials only (Marik metaanalysis; N = 965), the probability that lowdose steroids decrease mortality by more than 15% (i.e., RRR > 15%) was 0.41 (0.24 for RRR > 20% and 0.14 for RRR > 25%). For severe sepsis and septic shock trials combined, the results were as follows: (1) for the Annane metaanalysis (N = 1,228), the probabilities were 0.57 (RRR > 15%), 0.32 (RRR > 20%), and 0.13 (RRR > 25%); (2) for the Minneci metaanalysis (N = 1,171), the probability was 0.57 to achieve mortality RRR > 15%, 0.32 (RRR > 20%), and 0.14 (RRR > 25%). The removal of the Sprung trial from each analysis did not change the overall results. The probability of achieving shock reversal ranged from 65 to 92%. The probability of developing steroidinduced side effects was as follows: for gastrointestinal bleeding (N = 924), there was a 0.73 probability of steroids causing an RRI > 1%, 0.70 for RRI > 2%, and 0.67 for RRI > 5%; for superinfections (N = 964), probabilities were 0.81 (RRI > 1%), 0.76 (RRI > 2%), and 0.70 (RRI > 5%); and for hyperglycemia (N = 540), 0.99 (RRI > 1%), 0.97 (RRI > 2%), and 0.94 (RRI > 5%).
CONCLUSIONS: Based on clinically meaningful thresholds (RRR > 1525%) for mortality reduction in severe sepsis or septic shock, the Bayesian approach to all three metaanalyses consistently showed that lowdose steroids were not associated with survival benefits. The probabilities of developing steroidinduced side effects (superinfections, bleeding, and hyperglycemia) were high for all analyses.
21243334
Read More
How to grow a mind: statistics, structure, and abstraction.
In coming to understand the worldin learning concepts, acquiring language, and grasping causal relationsour minds make inferences that appear to go far beyond the data available. How do we do it? This review describes recent approaches to reverseengineering human learning and cognitive development and, in parallel, engineering more humanlike machine learning systems. Computational models that perform probabilistic inference over hierarchies of flexibly structured representations can address some of the deepest questions about the nature and origins of human thought: How does abstract knowledge guide learning and reasoning from sparse data? What forms does our knowledge take, across different domains and tasks? And how is that abstract knowledge itself acquired?
21393536
Read More
Statistics and truth in phylogenomics.
Phylogenomics refers to the inference of historical relationships among species using genomescale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genomescale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
21873298
Read More
A statistics primer.
Statistical input into an experimental study is often not considered until the results have already been obtained. This is unfortunate, as inadequate statistical planning 'up front' may result in conclusions which are invalid. This review will consider some of the statistical considerations that are appropriate when planning a research study.
21896018
Read More
Monday, February 17, 2014
Why we should let "evidencebased medicine" rest in peace.

Evidencebased medicine is a redundant term to the extent that doctors have always claimed they practiced medicine on the basis of evidence. They have, however, disagreed about what exactly constitutes legitimate evidence and how to synthesize the totality of evidence in a way that supports clinical action. Despite claims to the contrary, little progress has been made in solving this hard problem in any sort of formal way. The reification of randomized clinical trials (RCTs) and the tight linkage of such evidence to the development of clinical guidelines have led to error. In part, this relates to statistical and funding issues, but it also reflects the fact that the clinical events that comprise RCTs are not isomorphic with most clinical practice. Two possible and partial solutions are proposed: (1) to test empirically in new patient populations whether guidelines have the desired effects and (2) to accept that a distributed ecosystem of opinion rather than a hierarchical or consensus model of truth might better underwrite good clinical practice.
24160290
Read More
Sunday, February 16, 2014
The reliability of suicide statistics: a systematic review.
BACKGROUND: Reliable suicide statistics are a prerequisite for suicide monitoring and prevention. The aim of this study was to assess the reliability of suicide statistics through a systematic review of the international literature.
METHODS: We searched for relevant publications in EMBASE, Ovid Medline, PubMed, PsycINFO and the Cochrane Library up to October 2010. In addition, we screened related studies and reference lists of identified studies. We included studies published in English, German, French, Spanish, Norwegian, Swedish and Danish that assessed the reliability of suicide statistics. We excluded case reports, editorials, letters, comments, abstracts and statistical analyses. All three authors independently screened the abstracts, and then the relevant fulltext articles. Disagreements were resolved through consensus.
RESULTS: The primary search yielded 127 potential studies, of which 31 studies met the inclusion criteria and were included in the final review. The included studies were published between 1963 and 2009. Twenty were from Europe, seven from North America, two from Asia and two from Oceania. The manner of death had been reevaluated in 23 studies (403,993 cases), and there were six registry studies (19517,412 cases) and two combined registry and reevaluation studies. The study conclusions varied, from findings of fairly reliable to poor suicide statistics. Thirteen studies reported fairly reliable suicide statistics or underreporting of 010%. Of the 31 studies during the 46year period, 52% found more than 10% underreporting, and 39% found more than 30% underreporting or poor suicide statistics. Eleven studies reassessed a nationwide representative sample, although these samples were limited to suicide within subgroups. Only two studies compared data from two countries.
CONCLUSIONS: The main finding was that there is a lack of systematic assessment of the reliability of suicide statistics. Few studies have been done, and few countries have been covered. The findings support the general underreporting of suicide. In particular, nationwide studies and comparisons between countries are lacking.
22333684
Read More
Resolving confusion of tongues in statistics and machine learning: a primer for biologists and bioinformaticians.
Bioinformatics is the field where computational methods from various domains have come together for analysis of biological data. Each domain has introduced its own specific jargon. However, in closely related domains, e.g. machine learning and statistics, concordant and discordant terminology occurs, the later can lead to confusion. This article aims to help solve the confusion of tongues arising from these two closely related domains, which are frequently used in bioinformatics. We provide a short summary of the most commonly applied machine learning and statistical approaches to data analysis in bioinformatics, i.e. classification and statistical hypothesis testing. We explain differences and similarities in common terminology used in various domains, such as precision, recall, sensitivity and true positive rate. This primer can serve as a guide to the terminology used in these fields.
22246801
Read More
Applications of statistics to medical science (1) Fundamental concepts.
The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.
22041873
Read More
Philosophy and the practice of Bayesian statistics.
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypotheticodeductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.
22364575
Read More
"Statistics 101"a primer for the genetics of complex human disease.
This article reviews the basis of probability and statistics used in the genetic analysis of complex human diseases and illustrates their use in several simple examples. Much of the material presented here is so fundamental to statistics that it has become common knowledge in the field and the originators are no longer cited (e.g., Gauss).
21969626
Read More
Friday, February 14, 2014
Statistics for the nonstatistician: Part I.
Statistics for the nonstatistician: Part I.
South Med J. 2012 Mar;105(3):12630
Authors: Wissing DR, Timm D
Clinical research typically gathers sample data to make an inference about a population. Sample data carries the risk of introducing variation into the data, which can be estimated by the standard error of the mean. Data are described using descriptive statistics such as mean, median, mode, and standard deviation. The strength of the relation between two groups of data can be described using correlation. Hypothesis testing allows the researcher to accept or reject a null hypothesis by calculating the probability that differences between groups are the result of chance. By convention, if the probability is less than .05, the difference between the groups is said to be statistically significant. This probability is determined by statistical tests. Of these groups of tests, the Student t test and the analysis of variance are the more common parametric tests, and the chisquare test is common for nonparametric tests. This article provides a basic overview of biostatistics to assist the nonstatistician with interpreting statistical analyses in research articles.
22392207
Read MoreThursday, February 13, 2014
Statistics for the nonstatistician: Part II.
Statistics for the nonstatistician: Part II.
South Med J. 2012 Mar;105(3):1315
Authors: Hou W, Carden D
Part I of this twopart article provides a foundation of statistical terms and analyses for clinicians who are not statisticians. Types of data, how data are distributed and described, hypothesis testing, statistical significance, sample size determination, and the statistical analysis of interval scale (numeric) data were reviewed. Some data are presented not as interval data, but as named categories, also called nominal or categorical data. Part II reviews statistical tests and terms that are used when analyzing nominal data, data that do not resemble a normal, bellshaped curve when plotted on the x and yaxes, linear and logistic regression analysis, and survival analyses. A comprehensive algorithm of appropriate statistical analysis determined by the type, number, and distribution of collected variables also is provided.
22392208
Read MoreWednesday, February 12, 2014
Applications of statistics to medical science, II overview of statistical procedures for general use.
Applications of statistics to medical science, II overview of statistical procedures for general use.
J Nippon Med Sch. 2012;79(1):316
Authors: Watanabe H
Procedures of statistical analysis are reviewed to provide an overview of applications of statistics for general use. Topics that are dealt with are inference on a population, comparison of two populations with respect to means and probabilities, and multiple comparisons. This study is the second part of series in which we survey medical statistics. Arguments related to statistical associations and regressions will be made in subsequent papers.
22398788
Read MoreProbability, statistics, and computational science.
Probability, statistics, and computational science.
Methods Mol Biol. 2012;855:77110
Authors: Beerenwinkel N, Siebourg J
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
22407706
Read MoreStudies using English administrative data (Hospital Episode Statistics) to assess healthcare outcomessystematic review and recommendations for reporting.
Studies using English administrative data (Hospital Episode Statistics) to assess healthcare outcomessystematic review and recommendations for reporting.
Eur J Public Health. 2013 Feb;23(1):8692
Authors: Sinha S, Peach G, Poloniecki JD, Thompson MM, Holt PJ
BACKGROUND: Studies using English administrative data from the Hospital Episode Statistics (HES) are increasingly used for the assessment of healthcare quality. This study aims to catalogue the published body of studies using HES data to assess healthcare outcomes, to assess their methodological qualities and to determine if reporting recommendations can be formulated.
METHODS: Systematic searches of the EMBASE, Medline and Cochrane databases were performed using defined search terms. Included studies were those that described the use of HES data extracts to assess healthcare outcomes.
RESULTS: A total of 148 studies were included. The majority of published studies were on surgical specialties (60.8%), and the most common analytic theme was of inequalities and variations in treatment or outcome (27%). The volume of published studies has increased with time (r = 0.82, P < 0.0001), as has the length of study period (r = 0.76, P < 0.001) and the number of outcomes assessed per study (r = 0.72, P = 0.0023). Age (80%) and gender (57.4%) were the most commonly used factors in risk adjustment, and regression modelling was used most commonly (65.2%) to adjust for confounders. Generic methodologic data were better reported than those specific to HES data extraction. For the majority of parameters, there were no improvements with time.
CONCLUSIONS: Studies published using HES data to report healthcare outcomes have increased in volume, scope and complexity with time. However, persisting deficiencies related to both generic and contextspecific reporting have been identified. Recommendations have been made to improve these aspects as it is likely that the role of these studies in assessing health care, benchmarking practice and planning service delivery will continue to increase.
22577123
Read MoreUstatistics in genetic association studies.
Ustatistics in genetic association studies.
Hum Genet. 2012 Sep;131(9):1395401
Authors: Li H
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genomewide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Singlemarkerbased tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus and allelicheterogeneity, genebased and pathwaybased tests can be very useful in the next stage of the analysis of GWAS data. Ustatistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the Ustatisticsbased methods is large when the number of markers involved is large. We review several formulations of Ustatistics in genetic association studies and point out the links of these statistics with other similaritybased tests of genetic association. Finally, potential application of Ustatistics in analysis of the nextgeneration sequencing data and rare variants association studies are discussed.
22610525
Read MoreMonday, February 10, 2014
Applications of statistics to medical science, III. Correlation and regression.
Applications of statistics to medical science, III. Correlation and regression.
J Nippon Med Sch. 2012;79(2):11520
Authors: Watanabe H
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
22687354
Read MoreThe CKD enigma with misleading statistics and myths about CKD, and conflicting ESRD and death rates in the literature: results of a 2008 U.S. populationbased crosssectional CKD outcomes analysis.
The just released (August 2012) U.S. Preventive Services Task Force (USPSTF) report on chronic kidney disease (CKD) screening concluded that we know surprisingly little about whether screening adults with no signs or symptoms of CKD will improve health outcomes and that clinicians and patients deserve better information on CKD. The implications of the recently introduced CKD staging paradigm versus longterm renal outcomes remain uncertain. Furthermore, the natural history of CKD remains unclear. We completed a comparison of US populationwide CKD to projected annual incidence of end stage renal disease (ESRD) for 2008 based on current evidence in the literature . Projections for new ESRD resulted in an estimated 840,000 new ESRD cases in 2008, whereas the actual reported new ESRD incidence in 2008, according to the 2010 USRDS Annual Data Report, was in fact only 112,476, a gross overestimation by about 650%. We conclude that we as nephrologists in particular, and physicians in general, still do not understand the true natural history of CKD. We further discussed the limitations of current National Kidney Foundation Disease Outcomes Quality Initiative (NKF KDOQI) CKD staging paradigms. Moreover, we have raised questions regarding the CKD patients who need to be seen by nephrologists, and have further highlighted the limitations and intricacies of the individual patient prognostication among CKD populations when followed overtime, and the implications of these in relation to future planning of CKD care in general. Finally, the clear heterogeneity of the socalled CKD patient is brought into prominence as we review the very misleading concept of classifying and prognosticating all CKD patients as one homogenous patient population.
... Read More
A primer for clinical researchers in the emergency department: Part V: How to describe data and basic medical statistics.
In this series we address key topics for clinicians who conduct research as part of their work in the ED. In this section we will address important statistical concepts for clinical researchers and readers of clinical research publications. We use practical clinical examples of how to describe clinical data for presentation and publication, and explain key statistical concepts and tests clinical researchers will likely use for the majority of ED datasets.
... Read More
Statistics: dealing with categorical data.
This, the fifth of our series of articles on statistics in veterinary medicine, moves onto modelling categorical data, in particular assessing associations between variables. Some of the questions we shall consider are widely discussed in many clinical research publications, and we will use the ideas of hypothesis tests and confidence intervals to answer those questions.
... Read More
Applications of statistics to medical science, IV survival analysis.
The fundamental principles of survival analysis are reviewed. In particular, the KaplanMeier method and a proportional hazard model are discussed. This work is the last part of a series in which medical statistics are surveyed.
... Read More
Statistics: how many?
The fourth in our series of articles on statistics for clinicians focuses on how we determine the appropriate number of subjects to include in an experimental study to provide sufficient statistical "power".
... Read More
Sunday, February 9, 2014
Statistics: are we related?
Related Articles 
Statistics: are we related?
J Small Anim Pract. 2013 Mar;54(3):1248
Authors: Scott M, Flaherty D, Currall J
Abstract
This short addition to our series on clinical statistics concerns relationships, and answering questions such as "are blood pressure and weight related?" In a later article, we will answer the more interesting question about how they might be related. This article follows on logically from the previous one dealing with categorical data, the major difference being here that we will consider two continuous variables, which naturally leads to the use of a Pearson correlation or occasionally to a Spearman rank correlation coefficient.
PMID: 23458641 [PubMed  indexed for MEDLINE]
... Read MoreThursday, February 6, 2014
Statistics: using regression models.
Related Articles 
Statistics: using regression models.
J Small Anim Pract. 2013 Jun;54(6):28590
Authors: Scott M, Flaherty D, Currall J
Abstract
In a previous article, we asked the simple question "Are we related?" and used scatterplots and correlation coefficients to provide an answer. In this article, we will take this question and reword it to "How are we related?" and will demonstrate the statistical techniques required to reach a conclusion.
PMID: 23656306 [PubMed  indexed for MEDLINE]
... Read MoreSunday, February 2, 2014
Clinical statistics: five key statistical concepts for clinicians.
Related Articles 
Clinical statistics: five key statistical concepts for clinicians.
J Korean Assoc Oral Maxillofac Surg. 2013 Oct;39(5):203206
Authors: Choi YG
Abstract
Statistics is the science of data. As the foundation of scientific knowledge, data refers to evidentiary facts from the nature of reality by human action, observation, or experiment. Clinicians should be aware of the conditions of good data to support the validity of clinical modalities in reading scientific articles, one of the resources to revise or update their clinical knowledge and skills. The causeeffect link between clinical modality and outcome is ascertained as pattern statistic. The uniformity of nature guarantees the recurrence of data as the basic scientific evidence. Variation statistics are examined for patterns of recurrence. This provides information on the probability of recurrence of the causeeffect phenomenon. Multiple causal factors of natural phenomenon need a counterproof of absence in terms of the control group. A pattern of relation between a causal factor and an effect becomes recognizable, and thus, should be estimated as relation statistic. The type and meaning of each relation statistic should be wellunderstood. A study regarding a sample from the population of wide variations require clinicians to be aware of error statistics due to random chance. Incomplete human sense, coarse measurement instrument, and preconceived idea as a hypothesis that tends to bias the research, which gives rise to the necessity of keen critical independent mind with regard to the reported data.
PMID: 24471046 [PubMed  as supplied by publisher]
... Read More