I don't quite get this. Public health has allowed quite a bit of understanding between, say, poverty and disease, or education and lifespan. These are studies of correlation, but they are pretty useful.
There are several problems with working with large datasets and worse yet self response type surveys. Ideally, one can control the stimulus and separate people into control groups where one group gets a "treatment" while the other does not. That becomes infeasible when the effect sizes are very small (the effect each independent variable (cause) has to explain the dependent variable (effect). Self response surveys make it worse because of socially acceptable answers (lies) are told. So to correct problems like these, you need really large datasets of respondents.
A pure correlational design has a problem that one cannot determine whether or not the correlation is caused by unseen variables that affect both of the variables that correlate and it does not have any way to indicate whether
a causes
b or
b causes
a.
Thus,to untangle the causal links, the usual method employed is multiple regression in all its permutations which in a perfect case where a always cause b result--the error in your model is
0 and thus your model explains
100% of the cause and effect relationship
with your variables. That is never true in real life so statistical "controls" are created and ideally a researcher is guided by theory as to which variables are expected to be causal--as in cause and effect.
Not to bore folks, but designing a proper research study is actually quite difficult and anything dealing with human responses will demonstrate much lower levels of explanatory power for a model. The old phrase, Garbage In, Garbage Out, provides a pretty useful heuristic for detecting bad research. If the data is flawed, then the results will be flawed. If the methods are poorly suited for the data collected, then the results will be flawed. If the theory that created the model is flawed, then computer model results will not reflect reality.
First, there is the data problem--ideally all of your variables can take on any value positive and negative. As you move away from that, you have to rely on increasingly more problematic adjustments to your data that can generate false results if your underlying assumptions about the underlying data (examples, type of variable distribution, outliers, independent variables being correlated with each other, restrictions in data values or ranges) is wrong.
For example, some of the causal (independent) variables can be related either to each other or worse yet to a third variable that is not seen in the model. Findings can also be affected if the model is not specified correctly. This is where a key explanatory variable is left out of the model by chance or malice. Thus, the model can falsely attribute causal relationships with the variables in the model when a variable left out of the equation is in fact the true causal relationship. Large outliers or missing data are an issue as is seasonal effects, and data that demonstrates a trend (for example, indicators such as year of schooling attained has trend data upwards over time which has to be separated for some purposes). Variable measurement can be done poorly--as above, the years of schooling do not have a one-to-one relationship with knowledge for example. Far too often, researchers reuse flawed variable measurements because of tradition, the data is available, or the researcher is simply datamining whatever data is out there. With the widespread availability of all sorts of data, a data miner simply throws in variables in and removes them from a model to see what "sticks." In an older age, things like factor analysis, cluster analysis, etc. were used to tease out latent (unobserved) variables from theory but far too often researchers today simply try out variables in a model until they get one that generates statistical significance for the model. Thus, designing statistical studies is not easy in the first place and the problem is that most medical researchers are not statisticians which is a branch of mathematics. With today's statistical program packages, it is very easy to generate junk studies, especially from large samples, by essentially moving random variables in and out of the equation until you get a study to purports that a causes b at some level of statistical significance. Then you write post hoc (after the fact) explanations why a relationship exists and the strength of that relationship. This is called data mining by the polite and torturing the data until you get the results you want by the crass.
Peer review is not really set up to adequately review this sort of study because if you do not have access to the original data, and do not have the specific procedures followed including the findings from discarded models by the researchers that do not show significance, you will not get the right results. Statistical significance as a model indicator has some pretty poor shortcomings if people try intentionally to subvert it as random chance given enough tries with a model and its variants can generate desired results. Peer reviewers normally do not have access to this data and methods and so rely on the good faith of the submitting party to know if the data and procedures were correct for the problem addressed. They usually do not know the underlying statistical assumptions, adjustments to the data, model runs that were not reported, and so on.
As a result, there is a recognized problem with other researchers being able to reproduce studies that have been published in a swath of journals across the social sciences and medicine in particular. Some alleged pathbreaking studies have been unmasked as studies where those conducting the surveys committed outright fraud, used deceptive techniques, and pushed human subjects to get the results that they wanted. Some bad results have stemmed simply from ignorance of statistics, math, and data collection. A little less of a problem comes from the natural sciences but the removal of human subjects helps reduce the problem from the start.
A researcher has a choice to work with small datasets and be able to use rigorous controls or to use large datasets and essentially rely on theory and statistical controls to reach results. The first type has problems with sample selection issues and often the results cannot be generalized for the whole population (in firearm terms, it is when your scope has such great magnification that you get lost in the scope). The large datasets have the problem that you can miss a tree for the forest so to speak (your level of detail is not able to find a small interesting feature). Then, there are known problems with each sort of data collected and the research designs employed. Thus, to reach real results, requires both inductive and deductive approaches to research over time with a large dataset that has rigorous controls. The problem with that it is unaffordable.
So researchers simply try to do the best that they can if they are competent and mention all of the caveats and shortcomings of their chosen data and research methodology. However, this is the first thing scrapped in media reports because very few journalists understand science and research broadly, and even fewer are mathematically competent.
The best that the uninitiated can do is to look for repeated studies in a field, over time, with different participants, and ideally from different cultures. A fair amount of psychology studies rest on responses that people taking Psych 1101 in College provide for extra credit. Thus, if a quoted study is using college students, one should know that the results may or may not apply outside of that group.
Shocking findings regarding human behavior are usually garbage--as in human affairs, there is little new under the sun. This is less true when regarding the natural sciences where one has to rely on predictions and authority in a large number of cases.
One could read the Bible for example and find a large dollop of human behavior both good and bad that will generally acquaint oneself with how people behave in a fashion as both individuals and as a group without reading a psychology textbook. If you are secular, then reading something like Gibbons' Fall of the Roman Empire would do as much. Thus, much of what is done in "social sciences" is to keep putting old wine in new bottles.
BTW, the true scientist knows that they have human biases and as a result do everything they can to counteract those in practice. The lazy, malicious, and ignorant ones simply ignore the issue.
On the studies that you mention, believe it or not, you could more or less replicate what was found by simple human observation of the world around you in poverty, lifespan, and education. If one cares to do so, you will find that most of those studies by public health officials regarding human behavior were explained by novelists, historians, and philosophers long before public health officials got around to it regarding lifespan, poverty, and education. Disease is the one area because of ignorance in things like viruses, bacteria, parasites, etc. that medical researchers have done wonders but largely from individual diagnoses that accumulated, not large datasets and datamining which is then applied to individual circumstances. That in a nutshell is something called the ecological fallacy where what is true for the individual is not true for the group and vice versa.