Data manipulation: R - Python conversion guide
Main concepts
File management
| Category | R | Python |
|---|---|---|
| Paths | setwd(path) |
os.chdir(path) |
getwd() |
os.getcwd() |
|
file.path(path_1, ..., path_n) |
os.path.join(path_1, ..., path_n) |
|
| Files | list.files(path, include.dirs=True) |
os.listdir(path) |
file_test('-f', path) |
os.path.isfile(path) |
|
file_test('-d', path) |
os.path.isdir(path) |
|
read.csv(path_to_csv_file) |
pd.read_csv(path_to_csv_file) |
|
write.csv(df, path_to_csv_file) |
df.to_csv(path_to_csv_file) |
Chaining
The concept of chaining refers to the feature of applying operations successively. In R the symbol used to denote chaining of operations is %>%, while in python the symbol is a dot, ..
For example the following succession of operations in R
df %>% operation_1(parameters_1) %>% operation_2(parameters_2) %>% ... operation_n(parameters_n)
is equivalent to the python code block below
(df. .operation_1(parameters_1) .operation_2(parameters_2) ... .operation_n(parameters_n))
Exploring the data
| Category | R | Python |
|---|---|---|
| Look at Data | df %>% select(col_list) |
df[col_list] |
df %>% select(-col_list) |
df.drop(col_list, axis=1) |
|
df %>% head(n) |
df.head(n) |
|
df %>% tail(n) |
df.tail(n) |
|
df %>% summary() |
df.describe() |
|
| Data Types | df %>% str() |
df.dtypes or df.info |
df %>% NROW() and df %>% NCOL() |
df.shape |
Data types
| R | Python | Description |
|---|---|---|
character |
object |
String-related data |
factor |
object |
String-related data that can be ordered or classified |
numeric |
float64 |
Numerical data (with decimals) |
int |
int64 |
Integer numerical data |
POSIXct |
datetime64 |
Timestamps |
Data preprocessing
Filtering
| Category | R | Python |
|---|---|---|
| Overview | df %>% filter(col operation val_or_col) |
df[df['col'] operation val_or_col] |
| Operation | == and != |
== and != |
<, >, <=, >= |
<, >, <=, >= |
|
&, ⏐ |
&, ⏐ |
|
is.na() |
pd.isnull() |
|
%in% (val_1, ..., val_n) |
.isin([val_1, ..., val_n]) |
|
%like% 'val' |
.str.contains('val') |