R tutorial - Using Factors in R

Share this & earn $10
DataCamp
Published at : 22 Nov 2020
133939 views
752
36

In this introduction to R course you will learn about the basics of R, as well as the most common data structures it uses to store data

Join DataCamp today, and start our interactive intro to R programming tutorial for free: https://www.datacamp.com/courses/free-introduction-to-r

If you have some background in statistics, you'll have heard about categorical variables. Unlike numerical variables, categorical variables can only take on a limited number of different values. Otherwise put, a categorical variable can only belong to a limited number of categories. As R is a statistical programming language, it's not a surprise that there exists a specific data structure for this: factors. If you store categorical data as factors, you can rest assured that all the statistical modelling techniques will handle such data correctly.

A good example of a categorical variable is a person's blood type: it can be A, B, AB or O. Suppose we have asked 8 people what their bloodtype is and recorded the information as a vector `blood`.

Now, for R it is not yet clear that you're dealing with categorical variables, or factors, here. To convert this vector to a factor, you can use the `factor()` function.

The printout looks somewhat different than the original one: there are no double quotes anymore and also the factor levels, corresponding to the different categories, are printed. R basically does two things when you call the factor function on a character vector: first of all, it scans through the vector to see the different categories that are in there. In this case, that's "A", "AB", "B" and "O". Notice here that R sorts the levels alphabetically. Next, it converts the character vector, blood in this example, to a vector of integer values. These integers correspond to a set of character values to use when the factor is displayed. Inspecting the structure reveals this:

We're dealing with a factor with 4 levels. The "A"'s are encoded as 1, because it's the first level, "AB" is encoded as 2, "B" as 3 and "O" as 4. Why this conversion? Well, it can be that your categories are very long character strings. Each time repeating this string per observation can take up a lot of memory. By using this simple encoding, much less space is necessary. Just remember that factors are actually integer vectors, where each integer corresponds to a category, or a level.

As I said before, R automatically infers the factor levels from the vector you pass it and orders them alphabetically. If you want a different order in the levels, you can specify the levels argument in the factor function.

If you compare the structures of `blood_factor` and `blood_factor2`, you'll see that the encoding is different now.

Next to changing the order of the levels, it is possible to manually specify the level names, instead of letting R choose them. Suppose that for clarity, you want to display the blood types as `BT_A`, `BT_AB`, `BT_B` and `BT_O`. To name the factor afterwards, you can use the `levels()` function. Similar to the names function to name vectors, you can pass a vector to levels blood_factor.

You can also specify the category names, or levels, by specifying the `labels` argument in `factor()`.

I admit it, it's a bit confusing. For both of these approaches, it's important to follow the same order as the order of the factor levels: first A, then AB, then B and then O. But this can be pretty dangerous: you might have mistakenly changed the order.

To solve this, you can use a combination of manually specifying the `levels` and the `labels` argument when creating a factor. With `levels`, you specify the order, just like before, while with the labels, you specify a new name for the categories:

In the world of categorical variables, there's also a difference between nominal categorical variables and ordinal categorical variables. The nominal categorical variables has no implied order. For example, you can't really say the the blood type "O" is greater or less than the blood type "A". "O" is not worth more than "A" in any sense I can think of. Trying such a comparison with factors will generate a warning, telling you that less than is not meaningful:

However, there are examples for which such a natural ordering does exist. Consider for example this tshirt vector. It has codes ranging from from small to large. Here, you could say that extra large indeed is greater than, say, a small, right?

Of course, R provides a way to impose this kind of order on a factor, thus making it an ordered factor. Inside the factor() function, you simply set the argument ordered to TRUE, and specify the levels in ascending order.

Can you so how these less then signs appear between the different factor levels? This compactly shows that we're dealing with an ordered factor now. If we now try to perform a comparison, this call for example, ..., evaluates to TRUE, without a warning message, because a medium was specified to be less than a large.
Statistics (Field Of Study)R (Programming Language)data science

How to Save Money with a Preferred Pharmacy Network

How to Save Money with a Preferred Pharmacy Network

Aj and Chesca | On Site Wedding Film by Nice Print Photography

Aj and Chesca | On Site Wedding Film by Nice Print Photography

AWESOME DIY SOAP IDEAS THAT ARE EASY TO MAKE || 5-Minute Decor Ideas For Your Home!

AWESOME DIY SOAP IDEAS THAT ARE EASY TO MAKE || 5-Minute Decor Ideas For Your Home!

Why Men Are Refusing To Help Women and Children

Why Men Are Refusing To Help Women and Children

Cream - Politician (Farewell Concert - Extended Edition) (4 of 11)

Cream - Politician (Farewell Concert - Extended Edition) (4 of 11)

Andrew Thompson - There Must Be Some Kind Of Misunderstanding (Official Video)

Andrew Thompson - There Must Be Some Kind Of Misunderstanding (Official Video)

Life Comparison (You vs 7,000,000,000 people - How rich/smart/popular are you?)

Life Comparison (You vs 7,000,000,000 people - How rich/smart/popular are you?)

What You DON'T Know About The Giannis & Harden Rivalry In The NBA (Ft. No Skill, Dribbling, MVPs)

What You DON'T Know About The Giannis & Harden Rivalry In The NBA (Ft. No Skill, Dribbling, MVPs)

LOOPY - GO! SHOPPING! (FEAT. pH-1) [Official Live Performance] [ENG/CHN/JP]

LOOPY - GO! SHOPPING! (FEAT. pH-1) [Official Live Performance] [ENG/CHN/JP]

Read and Listen to Sentences using the Word " are " |    are sentences

Read and Listen to Sentences using the Word " are " | are sentences

Amazing hunting kill shots

Amazing hunting kill shots

Manifesting Fundamental: Your Desire showing up won’t solve your problems, you will.

Manifesting Fundamental: Your Desire showing up won’t solve your problems, you will.

18 CMD Tips, Tricks and Hacks | CMD Tutorial for Beginners | Command Prompt | Windows 7/8/8.1/10

18 CMD Tips, Tricks and Hacks | CMD Tutorial for Beginners | Command Prompt | Windows 7/8/8.1/10

Newshounds: A Place To Be Me (May Half Term) with Gazebo Theatre

Newshounds: A Place To Be Me (May Half Term) with Gazebo Theatre

Bastille Day: What are the July 14 celebrations all about?

Bastille Day: What are the July 14 celebrations all about?

Divine 👑 Feminine- You’re The Finest 💯💎

Divine 👑 Feminine- You’re The Finest 💯💎

AWS This Week: Amazon Personalize in GA, Amazon Comprehend Medical in 3 New Regions and more!

AWS This Week: Amazon Personalize in GA, Amazon Comprehend Medical in 3 New Regions and more!

I Made A Giant 20-Pound Hot Dog • Tasty

I Made A Giant 20-Pound Hot Dog • Tasty

Activate Your Eighth Chakra To Increase Your Spiritual Awareness | Deborah King

Activate Your Eighth Chakra To Increase Your Spiritual Awareness | Deborah King

How to use your OLD Games Consoles on HDMI / DVI Monitors & TVs

How to use your OLD Games Consoles on HDMI / DVI Monitors & TVs

The Nomination is Decided

The Nomination is Decided

Настройка VPS/VDS-сервера для начинающих (часть 1)

Настройка VPS/VDS-сервера для начинающих (часть 1)

TAEKOOK had made BWL their very own DNA substitute.

TAEKOOK had made BWL their very own DNA substitute.

You Must Hear This " This is Something Serious You Need To Know | David Icke

You Must Hear This " This is Something Serious You Need To Know | David Icke

Cher - You Haven't Seen the Last of Me (Official Music Video) | From 'Burlesque' (2010)

Cher - You Haven't Seen the Last of Me (Official Music Video) | From 'Burlesque' (2010)

Oddly Satisfying Video That Makes You Sleepy

Oddly Satisfying Video That Makes You Sleepy

The numerous Adventures of Napkin the Horse

The numerous Adventures of Napkin the Horse

IELTS Listening Actual Test 2020 with Answers | 27.10.2020

IELTS Listening Actual Test 2020 with Answers | 27.10.2020

[SOLVED] 'pip' is not recognized - with subtitle

[SOLVED] 'pip' is not recognized - with subtitle

How to Remove Scratch from Stainless Steel Sink

How to Remove Scratch from Stainless Steel Sink

Идиот. 9 серия

Идиот. 9 серия

Flite - Decisions

Flite - Decisions

I BECAME BIGFOOT AND CONFUSED EVERYONE! Build a Boat

I BECAME BIGFOOT AND CONFUSED EVERYONE! Build a Boat

Pale Prince is Kind of Crazy...

Pale Prince is Kind of Crazy...

【Granny's House Online】 waterfairy parkour craft mode map

【Granny's House Online】 waterfairy parkour craft mode map

Here Comes The Sun (Remastered 2009)

Here Comes The Sun (Remastered 2009)

Shawn Mendes - Wonder (Lyrics)

Shawn Mendes - Wonder (Lyrics)

You Can Let Go Now Daddy Lyrics

You Can Let Go Now Daddy Lyrics

Odd Or Even Numbers

Odd Or Even Numbers

HIGHWAY 2 C63 AMG VS SWEDISH POLICE BEST CHASE

HIGHWAY 2 C63 AMG VS SWEDISH POLICE BEST CHASE

My Happy Song | + More Kids Songs | Super Simple Songs

My Happy Song | + More Kids Songs | Super Simple Songs

The Most Expensive House In The World

The Most Expensive House In The World

How to be the Smartest in the Room | Ibram Kendi | Goalcast

How to be the Smartest in the Room | Ibram Kendi | Goalcast

CIA Says Putin Most Likely Behind Election Interference to Help Trump

CIA Says Putin Most Likely Behind Election Interference to Help Trump

Mat Kearney - Sooner Or Later (lyrics)

Mat Kearney - Sooner Or Later (lyrics)

28 - Be Able To Konu Anlatımı ve Pratik - İngilizce Gramer

28 - Be Able To Konu Anlatımı ve Pratik - İngilizce Gramer

Miraculous  Gacha Life Ep1

Miraculous Gacha Life Ep1

huffing again

huffing again

Range Rover в комплектации «Унизить всех».Anton Avtoman.

Range Rover в комплектации «Унизить всех».Anton Avtoman.