Data manipulation: R - Python conversion guide

r-vs-python.png

Main concepts

File management

Table 1: File management commands
Category R Python
Paths setwd(path) os.chdir(path)
  getwd() os.getcwd()
  file.path(path_1, ..., path_n) os.path.join(path_1, ..., path_n)
Files list.files(path, include.dirs=True) os.listdir(path)
  file_test('-f', path) os.path.isfile(path)
  file_test('-d', path) os.path.isdir(path)
  read.csv(path_to_csv_file) pd.read_csv(path_to_csv_file)
  write.csv(df, path_to_csv_file) df.to_csv(path_to_csv_file)

Chaining

The concept of chaining refers to the feature of applying operations successively. In R the symbol used to denote chaining of operations is %>%, while in python the symbol is a dot, ..

For example the following succession of operations in R

df %>%
  operation_1(parameters_1) %>%
  operation_2(parameters_2) %>%
  ...
  operation_n(parameters_n)

is equivalent to the python code block below

(df.
  .operation_1(parameters_1)
  .operation_2(parameters_2)
  ...
  .operation_n(parameters_n))

Exploring the data

Table 2: Commands to exploring the data
Category R Python
Look at Data df %>% select(col_list) df[col_list]
  df %>% select(-col_list) df.drop(col_list, axis=1)
  df %>% head(n) df.head(n)
  df %>% tail(n) df.tail(n)
  df %>% summary() df.describe()
Data Types df %>% str() df.dtypes or df.info
  df %>% NROW() and df %>% NCOL() df.shape

Data types

Table 3: Brief list of common data types
R Python Description
character object String-related data
factor object String-related data that can be ordered or classified
numeric float64 Numerical data (with decimals)
int int64 Integer numerical data
POSIXct datetime64 Timestamps

Data preprocessing

Filtering

Table 4: Commands to preprocessing data
Category R Python
Overview df %>% filter(col operation val_or_col) df[df['col'] operation val_or_col]
Operation == and != == and !=
  <, >, <=, >= <, >, <=, >=
  &, &,
  is.na() pd.isnull()
  %in% (val_1, ..., val_n) .isin([val_1, ..., val_n])
  %like% 'val' .str.contains('val')

Changing columns

Table 5: Commands to modifying columns of data
Category R Python
New column on top of an old one df %>% mutate(new_col = operation(other_cols)) df.assign(new_col=lambda x: operation(x))
Unite columns df %>% unite(merged_col, old_col_list) df['new_col'] = df[old_col_list].agg('-'.join, axis=1)

Mathematical operations

Table 6: Some mathematical operations to transform data
Operation R Python
\(\sqrt{x}\) sqrt(x) np.sqrt(x)
\(\sin(x)\) sin(x) np.sin(x)
\(\cos(x)\) cos(x) np.cos(x)
\(\tan(x)\) tan(x) np.tan(x)

Datetime conversion

Table 7: Command to transform a string representation of date/time to datetime
Action R Python
Converts string to datetime as.POSITct(col, format) pd.to_datetime(col, format)

In the above table, format is a string describing the structure of the field. For example:

  • '$Y': complete year, 2025.
  • '$y': abbreviated year, 25.
  • '%B': full name of the month, December.
  • '%b': abbreviated name of the month, Dec.
  • '%m': number of the month, 12.
  • '%d': day the month, 29.
  • '%j': day of the year, 363.

There are many other format options.

Author: Oscar Castillo-Felisola

Created: 2026-04-02 Thu 14:59