Questions tagged [linear-regression]
for issues related to linear regression modelling approach
linear-regression
6,575
questions
324
votes
10
answers
480k
views
Add regression line equation and R^2 on graph
I wonder how to add regression line equation and R^2 on the ggplot. My code is:
library(ggplot2)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
p <- ggplot(data = ...
279
votes
15
answers
299k
views
What is the difference between linear regression and logistic regression? [closed]
When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values.
...
238
votes
7
answers
472k
views
How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting
I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic).
I use Python and Numpy and for polynomial fitting there is a ...
193
votes
6
answers
620k
views
Adding a regression line on a ggplot
I'm trying hard to add a regression line on a ggplot. I first tried with abline but I didn't manage to make it work. Then I tried this...
data = data.frame(x.plot=rep(seq(1,5),10),y.plot=rnorm(50))
...
166
votes
6
answers
338k
views
How to force R to use a specified factor level as reference in a regression?
How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?
It's just using some level by default.
lm(x ~ y + as.factor(b))
with b {0, 1, 2, 3, 4}. ...
155
votes
15
answers
243k
views
Multiple linear regression in Python
I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent ...
136
votes
10
answers
137k
views
Linear Regression and group by in R
I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for ...
117
votes
8
answers
396k
views
Linear regression with matplotlib / numpy
I'm trying to generate a linear regression on a scatter plot I have generated, however my data is in list format, and all of the examples I can find of using polyfit require using arange. arange doesn'...
113
votes
8
answers
330k
views
Accuracy Score ValueError: Can't Handle mix of binary and continuous target
I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric.
This is ...
83
votes
8
answers
234k
views
How to overplot a line on a scatter plot in python?
I have two vectors of data and I've put them into pyplot.scatter(). Now I'd like to over plot a linear fit to these data. How would I do this? I've tried using scikitlearn and np.polyfit().
77
votes
4
answers
148k
views
Linear regression analysis with string/categorical features (variables)?
Regression algorithms seem to be working on features represented as numbers.
For example:
This data set doesn't contain categorical features/variables. It's quite clear how to do regression on this ...
76
votes
6
answers
149k
views
How to get a regression summary in scikit-learn like R does?
As an R user, I wanted to also get up to speed on scikit.
Creating a linear regression model(s) is fine, but can't seem to find a reasonable way to get a standard summary of regression output.
...
75
votes
4
answers
27k
views
why gradient descent when we can solve linear regression analytically
what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytical method so why we ...
65
votes
6
answers
209k
views
gradient descent using python and numpy
def gradient(X_norm,y,theta,alpha,m,n,num_it):
temp=np.array(np.zeros_like(theta,float))
for i in range(0,num_it):
h=np.dot(X_norm,theta)
#temp[j]=theta[j]-(alpha/m)*( np.sum( ...
61
votes
1
answer
233k
views
How to calculate the 95% confidence interval for the slope in a linear regression model in R
Here is an exercise from Introductory Statistics with R:
With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, ...
58
votes
10
answers
66k
views
Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.
I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.
The code I've ...
57
votes
6
answers
82k
views
Why do I get only one parameter from a statsmodels OLS fit
Here is what I am doing:
$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>&...
55
votes
1
answer
6k
views
Is there a better alternative than string manipulation to programmatically build formulas?
Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous.
I'm writing a function that fits multiple models. Parts of the formulas ...
54
votes
2
answers
36k
views
How to make seaborn regplot partially see through (alpha)
When using seaborn barplot, I can specify an alpha to make the bars semi-translucent. However, when I try this with seaborn regplot, I get an error saying this is an unexpected argument.
I read the ...
51
votes
6
answers
109k
views
TensorFlow: "Attempting to use uninitialized value" in variable initialization
I am trying to implement multivariate linear regression in Python using TensorFlow, but have run into some logical and implementation issues. My code throws the following error:
Attempting to use ...
49
votes
3
answers
128k
views
predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading
This R code throws a warning
# Fit regression model to each cluster
y <- list()
length(y) <- k
vars <- list()
length(vars) <- k
f <- list()
length(f) <- k
for (i in 1:k) {
vars[...
46
votes
5
answers
127k
views
How to extract the regression coefficient from statsmodels.api?
result = sm.OLS(gold_lookback, silver_lookback ).fit()
After I get the result, how can I get the coefficient and the constant?
In other words, if
y = ax + c
how to get the values a and c?
46
votes
3
answers
70k
views
Linear Regression with a known fixed intercept in R
I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm().
I found an example on the ...
44
votes
8
answers
158k
views
Error in Confusion Matrix : the data and reference factors must have the same number of levels
I've trained a Linear Regression model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error:
Error in confusionMatrix.default(pred, testing$Final) :
the ...
42
votes
5
answers
65k
views
Linear Regression :: Normalization (Vs) Standardization
I am using Linear regression to predict data. But, I am getting totally contrasting results when I Normalize (Vs) Standardize variables.
Normalization = x -xmin/ xmax – xmin
Zero ...
41
votes
7
answers
62k
views
Linear Regression in Javascript [closed]
I want to do Least Squares Fitting in Javascript in a web browser.
Currently users enter data point information using HTML text inputs and then I grab that data with jQuery and graph it with Flot.
...
40
votes
4
answers
68k
views
How to force zero interception in linear regression?
I have some more or less linear data of the form:
x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0]
y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, ...
40
votes
7
answers
40k
views
predict.lm() with an unknown factor level in test data
I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error.
Is ...
39
votes
5
answers
158k
views
Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range)
I'm new to Python and trying to perform linear regression using sklearn on a pandas dataframe. This is what I did:
data = pd.read_csv('xxxx.csv')
After that I got a DataFrame of two columns, let's ...
39
votes
1
answer
48k
views
In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing? [closed]
In the sklearn.linear_model.LinearRegression method, there is a parameter that is fit_intercept = TRUE or fit_intercept = FALSE. I am wondering if we set it to TRUE, does it add an additional ...
37
votes
9
answers
72k
views
How to find the features names of the coefficients using scikit linear regression?
I use scikit linear regression and if I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff.
#training ...
37
votes
2
answers
30k
views
How (and why) do you use contrasts?
Under what cases do you create contrasts in your analysis? How is it done and what is it used for?
I checked ?contrasts and ?C - both lead to "Chapter 2 of Statistical Models in S", which is not ...
35
votes
9
answers
142k
views
ValueError: Expected 2D array, got 1D array instead:
While practicing Simple Linear Regression Model I got this error,
I think there is something wrong with my data set.
Here is my data set:
Here is independent variable X:
Here is dependent variable ...
35
votes
2
answers
15k
views
Pandas rolling regression: alternatives to looping
I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20.
The question of how to run rolling OLS ...
34
votes
2
answers
56k
views
How does predict.lm() compute confidence interval and prediction interval?
I ran a regression:
CopierDataRegression <- lm(V1~V2, data=CopierData1)
and my task was to obtain a
90% confidence interval for the mean response given V2=6 and
90% prediction interval when V2=...
34
votes
6
answers
75k
views
python linear regression predict by date
I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.
This is the dataframe I have:
data_df =
date value
2016-01-15 1555
...
32
votes
8
answers
69k
views
Are there any Linear Regression Function in SQL Server?
Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?
32
votes
3
answers
62k
views
How to add interaction term in Python sklearn
If I have independent variables [x1, x2, x3]
If I fit linear regression in sklearn
it will give me something like this:
y = a*x1 + b*x2 + c*x3 + intercept
Polynomial regression with poly =2
will ...
32
votes
3
answers
52k
views
Python scikit learn Linear Model Parameter Standard Error
I am working with sklearn and specifically the linear_model module. After fitting a simple linear as in
import pandas as pd
import numpy as np
from sklearn import linear_model
randn = np.random....
31
votes
8
answers
132k
views
Scikit-Learn Linear Regression how to get coefficient's respective features?
I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the ...
31
votes
2
answers
84k
views
lme4::lmer reports "fixed-effect model matrix is rank deficient", do I need a fix and how to?
I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says
fixed-effect model matrix is rank deficient so ...
31
votes
3
answers
35k
views
OLS Regression: Scikit vs. Statsmodels? [closed]
Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
30
votes
2
answers
61k
views
How to plot statsmodels linear regression (OLS) cleanly
Problem Statement:
I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it:
Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
28
votes
1
answer
25k
views
Linear Regression and Gradient Descent in Scikit learn?
In this Coursera course for machine learning, it says gradient descent should converge.
I'm using Linear regression from scikit learn. It doesn't provide gradient descent info. I have seen many ...
28
votes
3
answers
50k
views
Why is numpy.linalg.pinv() preferred over numpy.linalg.inv() for creating inverse of a matrix in linear regression
If we want to search for the optimal parameters theta for a linear regression model by using the normal equation with:
theta = inv(X^T * X) * X^T * y
one step is to calculate inv(X^T*X). Therefore ...
27
votes
2
answers
58k
views
geom_smooth in ggplot2 not working/showing up
I am trying to add a linear regression line to my graph, but when it's run, it's not showing up. The code below is simplified. There are usually multiple points on each day. The graph comes out fine ...
27
votes
3
answers
66k
views
How to get the P Value in a Variable from OLSResults in Python?
The OLSResults of
df2 = pd.read_csv("MultipleRegression.csv")
X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']]
Y = df2['Price']
X = add_constant(X)
fit = sm.OLS(Y, X).fit()
print(fit....
25
votes
5
answers
28k
views
Can scipy.stats identify and mask obvious outliers?
With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. ...
25
votes
3
answers
9k
views
Comparing Results from StandardScaler vs Normalizer in Linear Regression
I'm working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer and StandardScaler, and the results are puzzling.
I'm using the boston ...
25
votes
3
answers
44k
views
Efficient Cointegration Test in Python
I am wondering if there is a better way to test if two variables are cointegrated than the following method:
import numpy as np
import statsmodels.api as sm
import statsmodels.tsa.stattools as ts
y =...