Get ranges for synonymous and non-synonymous nucleotide positions within a codon separately

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).

grTargetGene itself looks like this

> grTargetGene



GRanges object with 11 ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype prev_exons_length coding_pos

      <Rle>              <IRanges>  <Rle> |     <character>    <character>         <numeric>  <numeric>

   [1]     chr2 [148602722, 148602776]      + | ENSG00000121989 protein_coding       0           1

   [2]     chr2 [148653870, 148654077]      + | ENSG00000121989 protein_coding       55          2

   [3]     chr2 [148657027, 148657136]      + | ENSG00000121989 protein_coding       263         3

   [4]     chr2 [148657313, 148657467]      + | ENSG00000121989 protein_coding       373         2

   [5]     chr2 [148672760, 148672903]      + | ENSG00000121989 protein_coding       528         1

   [6]     chr2 [148674852, 148674995]      + | ENSG00000121989 protein_coding       672         1

   [7]     chr2 [148676016, 148676161]      + | ENSG00000121989 protein_coding       816         1

   [8]     chr2 [148677799, 148677913]      + | ENSG00000121989 protein_coding       962         3

   [9]     chr2 [148680542, 148680680]      + | ENSG00000121989 protein_coding       1077        1

  [10]     chr2 [148683600, 148683730]      + | ENSG00000121989 protein_coding       1216        2

  [11]     chr2 [148684649, 148684843]      + | ENSG00000121989 protein_coding       1347        1

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)

> grTargetGene_Nonsynonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602722, 148602723]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602725, 148602726]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602728, 148602729]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602731, 148602732]      + | ENSG00000121989 protein_coding







> grTargetGene_Synonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602724, 148602724]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602727, 148602727]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602730, 148602730]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602733, 148602733]      + | ENSG00000121989 protein_coding

I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.

Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.

> library("GenomicRanges")

> dput(grTargetGene)



new("GRanges"

, seqnames = new("Rle"

, values = structure(1L, .Label = "chr2", class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, ranges = new("IRanges"

, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L, 

148674852L)

, width = c(55L, 208L, 110L, 155L, 144L, 144L)

, NAMES = NULL

, elementType = "integer"

, elementMetadata = NULL

, metadata = list()

)

, strand = new("Rle"

, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, elementMetadata = new("DataFrame"

, rownames = NULL

, nrows = 6L

, listData = structure(list(ensembl_ids =

c("ENSG00000121989","ENSG00000121989", 

"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"

), gene_biotype = c("protein_coding", "protein_coding", "protein_coding", 

"protein_coding", "protein_coding", "protein_coding"), cds_length =   

c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L, 

148602086L, 148602086L, 148602086L, 148602086L, 148602086L), 

gene_end_position = c(148688393L, 148688393L, 148688393L, 

148688393L, 148688393L, 148688393L), prev_exons_length = c(0, 

55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =  

c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",

"gene_end_position", 

"prev_exons_length", "coding_pos"))

, elementType = "ANY"

, elementMetadata = NULL

, metadata = list()

)

, seqinfo = new("Seqinfo"

, seqnames = "chr2"

, seqlengths = NA_integer_

, is_circular = NA

, genome = NA_character_

)

, metadata = list()

)

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43

Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22

It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59

@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06

Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13

add a comment |

grTargetGene itself looks like this

> grTargetGene



GRanges object with 11 ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype prev_exons_length coding_pos

      <Rle>              <IRanges>  <Rle> |     <character>    <character>         <numeric>  <numeric>

   [1]     chr2 [148602722, 148602776]      + | ENSG00000121989 protein_coding       0           1

   [2]     chr2 [148653870, 148654077]      + | ENSG00000121989 protein_coding       55          2

   [3]     chr2 [148657027, 148657136]      + | ENSG00000121989 protein_coding       263         3

   [4]     chr2 [148657313, 148657467]      + | ENSG00000121989 protein_coding       373         2

   [5]     chr2 [148672760, 148672903]      + | ENSG00000121989 protein_coding       528         1

   [6]     chr2 [148674852, 148674995]      + | ENSG00000121989 protein_coding       672         1

   [7]     chr2 [148676016, 148676161]      + | ENSG00000121989 protein_coding       816         1

   [8]     chr2 [148677799, 148677913]      + | ENSG00000121989 protein_coding       962         3

   [9]     chr2 [148680542, 148680680]      + | ENSG00000121989 protein_coding       1077        1

  [10]     chr2 [148683600, 148683730]      + | ENSG00000121989 protein_coding       1216        2

  [11]     chr2 [148684649, 148684843]      + | ENSG00000121989 protein_coding       1347        1

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> grTargetGene_Nonsynonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602722, 148602723]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602725, 148602726]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602728, 148602729]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602731, 148602732]      + | ENSG00000121989 protein_coding







> grTargetGene_Synonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602724, 148602724]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602727, 148602727]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602730, 148602730]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602733, 148602733]      + | ENSG00000121989 protein_coding

> library("GenomicRanges")

> dput(grTargetGene)



new("GRanges"

, seqnames = new("Rle"

, values = structure(1L, .Label = "chr2", class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, ranges = new("IRanges"

, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L, 

148674852L)

, width = c(55L, 208L, 110L, 155L, 144L, 144L)

, NAMES = NULL

, elementType = "integer"

, elementMetadata = NULL

, metadata = list()

)

, strand = new("Rle"

, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, elementMetadata = new("DataFrame"

, rownames = NULL

, nrows = 6L

, listData = structure(list(ensembl_ids =

c("ENSG00000121989","ENSG00000121989", 

"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"

), gene_biotype = c("protein_coding", "protein_coding", "protein_coding", 

"protein_coding", "protein_coding", "protein_coding"), cds_length =   

c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L, 

148602086L, 148602086L, 148602086L, 148602086L, 148602086L), 

gene_end_position = c(148688393L, 148688393L, 148688393L, 

148688393L, 148688393L, 148688393L), prev_exons_length = c(0, 

55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =  

c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",

"gene_end_position", 

"prev_exons_length", "coding_pos"))

, elementType = "ANY"

, elementMetadata = NULL

, metadata = list()

)

, seqinfo = new("Seqinfo"

, seqnames = "chr2"

, seqlengths = NA_integer_

, is_circular = NA

, genome = NA_character_

)

, metadata = list()

)

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43

Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22

It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59

@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06

Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13

add a comment |

grTargetGene itself looks like this

> grTargetGene



GRanges object with 11 ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype prev_exons_length coding_pos

      <Rle>              <IRanges>  <Rle> |     <character>    <character>         <numeric>  <numeric>

   [1]     chr2 [148602722, 148602776]      + | ENSG00000121989 protein_coding       0           1

   [2]     chr2 [148653870, 148654077]      + | ENSG00000121989 protein_coding       55          2

   [3]     chr2 [148657027, 148657136]      + | ENSG00000121989 protein_coding       263         3

   [4]     chr2 [148657313, 148657467]      + | ENSG00000121989 protein_coding       373         2

   [5]     chr2 [148672760, 148672903]      + | ENSG00000121989 protein_coding       528         1

   [6]     chr2 [148674852, 148674995]      + | ENSG00000121989 protein_coding       672         1

   [7]     chr2 [148676016, 148676161]      + | ENSG00000121989 protein_coding       816         1

   [8]     chr2 [148677799, 148677913]      + | ENSG00000121989 protein_coding       962         3

   [9]     chr2 [148680542, 148680680]      + | ENSG00000121989 protein_coding       1077        1

  [10]     chr2 [148683600, 148683730]      + | ENSG00000121989 protein_coding       1216        2

  [11]     chr2 [148684649, 148684843]      + | ENSG00000121989 protein_coding       1347        1

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> grTargetGene_Nonsynonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602722, 148602723]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602725, 148602726]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602728, 148602729]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602731, 148602732]      + | ENSG00000121989 protein_coding







> grTargetGene_Synonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602724, 148602724]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602727, 148602727]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602730, 148602730]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602733, 148602733]      + | ENSG00000121989 protein_coding

> library("GenomicRanges")

> dput(grTargetGene)



new("GRanges"

, seqnames = new("Rle"

, values = structure(1L, .Label = "chr2", class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, ranges = new("IRanges"

, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L, 

148674852L)

, width = c(55L, 208L, 110L, 155L, 144L, 144L)

, NAMES = NULL

, elementType = "integer"

, elementMetadata = NULL

, metadata = list()

)

, strand = new("Rle"

, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, elementMetadata = new("DataFrame"

, rownames = NULL

, nrows = 6L

, listData = structure(list(ensembl_ids =

c("ENSG00000121989","ENSG00000121989", 

"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"

), gene_biotype = c("protein_coding", "protein_coding", "protein_coding", 

"protein_coding", "protein_coding", "protein_coding"), cds_length =   

c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L, 

148602086L, 148602086L, 148602086L, 148602086L, 148602086L), 

gene_end_position = c(148688393L, 148688393L, 148688393L, 

148688393L, 148688393L, 148688393L), prev_exons_length = c(0, 

55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =  

c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",

"gene_end_position", 

"prev_exons_length", "coding_pos"))

, elementType = "ANY"

, elementMetadata = NULL

, metadata = list()

)

, seqinfo = new("Seqinfo"

, seqnames = "chr2"

, seqlengths = NA_integer_

, is_circular = NA

, genome = NA_character_

)

, metadata = list()

)

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

grTargetGene itself looks like this

> grTargetGene



GRanges object with 11 ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype prev_exons_length coding_pos

      <Rle>              <IRanges>  <Rle> |     <character>    <character>         <numeric>  <numeric>

   [1]     chr2 [148602722, 148602776]      + | ENSG00000121989 protein_coding       0           1

   [2]     chr2 [148653870, 148654077]      + | ENSG00000121989 protein_coding       55          2

   [3]     chr2 [148657027, 148657136]      + | ENSG00000121989 protein_coding       263         3

   [4]     chr2 [148657313, 148657467]      + | ENSG00000121989 protein_coding       373         2

   [5]     chr2 [148672760, 148672903]      + | ENSG00000121989 protein_coding       528         1

   [6]     chr2 [148674852, 148674995]      + | ENSG00000121989 protein_coding       672         1

   [7]     chr2 [148676016, 148676161]      + | ENSG00000121989 protein_coding       816         1

   [8]     chr2 [148677799, 148677913]      + | ENSG00000121989 protein_coding       962         3

   [9]     chr2 [148680542, 148680680]      + | ENSG00000121989 protein_coding       1077        1

  [10]     chr2 [148683600, 148683730]      + | ENSG00000121989 protein_coding       1216        2

  [11]     chr2 [148684649, 148684843]      + | ENSG00000121989 protein_coding       1347        1

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> grTargetGene_Nonsynonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602722, 148602723]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602725, 148602726]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602728, 148602729]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602731, 148602732]      + | ENSG00000121989 protein_coding







> grTargetGene_Synonym



GRanges object with X ranges and 7 metadata columns:

   seqnames                 ranges strand |     ensembl_ids   gene_biotype 

      <Rle>              <IRanges>  <Rle> |     <character>    <character> 

   [1]     chr2 [148602724, 148602724]      + | ENSG00000121989 protein_coding

   [2]     chr2 [148602727, 148602727]      + | ENSG00000121989 protein_coding

   [3]     chr2 [148602730, 148602730]      + | ENSG00000121989 protein_coding

   [4]     chr2 [148602733, 148602733]      + | ENSG00000121989 protein_coding

> library("GenomicRanges")

> dput(grTargetGene)



new("GRanges"

, seqnames = new("Rle"

, values = structure(1L, .Label = "chr2", class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, ranges = new("IRanges"

, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L, 

148674852L)

, width = c(55L, 208L, 110L, 155L, 144L, 144L)

, NAMES = NULL

, elementType = "integer"

, elementMetadata = NULL

, metadata = list()

)

, strand = new("Rle"

, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")

, lengths = 6L

, elementMetadata = NULL

, metadata = list()

)

, elementMetadata = new("DataFrame"

, rownames = NULL

, nrows = 6L

, listData = structure(list(ensembl_ids =

c("ENSG00000121989","ENSG00000121989", 

"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"

), gene_biotype = c("protein_coding", "protein_coding", "protein_coding", 

"protein_coding", "protein_coding", "protein_coding"), cds_length =   

c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L, 

148602086L, 148602086L, 148602086L, 148602086L, 148602086L), 

gene_end_position = c(148688393L, 148688393L, 148688393L, 

148688393L, 148688393L, 148688393L), prev_exons_length = c(0, 

55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =  

c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",

"gene_end_position", 

"prev_exons_length", "coding_pos"))

, elementType = "ANY"

, elementMetadata = NULL

, metadata = list()

)

, seqinfo = new("Seqinfo"

, seqnames = "chr2"

, seqlengths = NA_integer_

, is_circular = NA

, genome = NA_character_

)

, metadata = list()

)

r bioconductor genomicranges

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

edited Nov 23 '18 at 10:06

zx8754

30.7k766105

asked Nov 22 '18 at 10:23

lizaveta

657

asked Nov 22 '18 at 10:23

lizaveta

657

asked Nov 22 '18 at 10:23

lizaveta

657

Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43

Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22

It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59

@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06

Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13

add a comment |

Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43

Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22

It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59

@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06

Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13

Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43

Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22

It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59

@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06

Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13

add a comment |

2 Answers
2

active

oldest

votes

How about the following:

grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {

    ranges(grTargetGene) <- IRanges(

        start = start(grTargetGene) + x[1] - 1,

        end = start(grTargetGene) + x[2] - 1)

    return(grTargetGene) })

grl

#$Nonsym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames              ranges strand |     ensembl_ids   gene_biotype

#         <Rle>           <IRanges>  <Rle> |     <character>    <character>

#  [1]     chr2 148602722-148602723      + | ENSG00000121989 protein_coding

#  [2]     chr2 148653870-148653871      + | ENSG00000121989 protein_coding

#  [3]     chr2 148657027-148657028      + | ENSG00000121989 protein_coding

#  [4]     chr2 148657313-148657314      + | ENSG00000121989 protein_coding

#  [5]     chr2 148672760-148672761      + | ENSG00000121989 protein_coding

#  [6]     chr2 148674852-148674853      + | ENSG00000121989 protein_coding

#      cds_length gene_start_position gene_end_position prev_exons_length

#       <numeric>           <integer>         <integer>         <numeric>

#  [1]       1542           148602086         148688393                 0

#  [2]       1542           148602086         148688393                55

#  [3]       1542           148602086         148688393               263

#  [4]       1542           148602086         148688393               373

#  [5]       1542           148602086         148688393               528

#  [6]       1542           148602086         148688393               672

#      coding_pos

#       <numeric>

#  [1]          1

#  [2]          2

#  [3]          3

#  [4]          2

#  [5]          1

#  [6]          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

#

#$Sym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames    ranges strand |     ensembl_ids   gene_biotype cds_length

#         <Rle> <IRanges>  <Rle> |     <character>    <character>  <numeric>

#  [1]     chr2 148602724      + | ENSG00000121989 protein_coding       1542

#  [2]     chr2 148653872      + | ENSG00000121989 protein_coding       1542

#  [3]     chr2 148657029      + | ENSG00000121989 protein_coding       1542

#  [4]     chr2 148657315      + | ENSG00000121989 protein_coding       1542

#  [5]     chr2 148672762      + | ENSG00000121989 protein_coding       1542

#  [6]     chr2 148674854      + | ENSG00000121989 protein_coding       1542

#      gene_start_position gene_end_position prev_exons_length coding_pos

#                <integer>         <integer>         <numeric>  <numeric>

#  [1]           148602086         148688393                 0          1

#  [2]           148602086         148688393                55          2

#  [3]           148602086         148688393               263          3

#  [4]           148602086         148688393               373          2

#  [5]           148602086         148688393               528          1

#  [6]           148602086         148688393               672          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

add a comment |

-1

I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)

CodonPosition_separation = function(grTargetGene) {

    grTargetGene = sort(grTargetGene)

    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])

    if (length(grTargetGene) >1) {

        for (l in 2:length(grTargetGene)) {

          grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]

        }

      }

  grTargetGene$coding_pos =  grTargetGene$prev_exons_length%%3+1

  grTargetGene_N =  GRanges()

  grTargetGene_S =  GRanges()

  for (l in 1:length(grTargetGene)) {

    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}

    if (as.character(strand(grTargetGene)[1]) =="+"){

      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]

      end_ns = end(grTargetGene[l])

      if (start_ns <=end_ns) {

        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

      }

      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]

      end_s = end(grTargetGene[l])

      if (start_s <=end_s) {

        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_syn = start_syn

      }



    } else {

      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]

      end_ns = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

      }

      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]

      end_s = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_syn = start_syn

      }

    }

    if (exists("start_nonsyn")) {

      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)

      gr_nonsyn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),

        strand = rep(strand(grTargetGene[l]), length_nonsyn),

        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))

      )

      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])

      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)

      } 

    if (exists("start_syn")) {

      length_syn = length(start_syn)

      gr_syn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_syn),

        strand = rep(strand(grTargetGene[l]), length_syn),

        ranges = IRanges(start = start_syn, end = end_syn)

      )

      gr_syn = intersect(gr_syn,grTargetGene[l])

      grTargetGene_S = append(grTargetGene_S, gr_syn)

    }

  }

  return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))

}

It works nicely:

> CodonPosition_separation(grTargetGene)

$grTargetGene_S

GRanges object with 514 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602724, 148602724]      +

    [2]     chr2 [148602727, 148602727]      +

    [3]     chr2 [148602730, 148602730]      +

    [4]     chr2 [148602733, 148602733]      +

    [5]     chr2 [148602736, 148602736]      +

    ...      ...                    ...    ...

  [510]     chr2 [148684831, 148684831]      +

  [511]     chr2 [148684834, 148684834]      +

  [512]     chr2 [148684837, 148684837]      +

  [513]     chr2 [148684840, 148684840]      +

  [514]     chr2 [148684843, 148684843]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths



$grTargetGene_N

GRanges object with 517 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602722, 148602723]      +

    [2]     chr2 [148602725, 148602726]      +

    [3]     chr2 [148602728, 148602729]      +

    [4]     chr2 [148602731, 148602732]      +

    [5]     chr2 [148602734, 148602735]      +

    ...      ...                    ...    ...

  [513]     chr2 [148684829, 148684830]      +

  [514]     chr2 [148684832, 148684833]      +

  [515]     chr2 [148684835, 148684836]      +

  [516]     chr2 [148684838, 148684839]      +

  [517]     chr2 [148684841, 148684842]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

answered Nov 30 '18 at 14:42

lizaveta

657

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428778%2fget-ranges-for-synonymous-and-non-synonymous-nucleotide-positions-within-a-codon%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

How about the following:

grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {

    ranges(grTargetGene) <- IRanges(

        start = start(grTargetGene) + x[1] - 1,

        end = start(grTargetGene) + x[2] - 1)

    return(grTargetGene) })

grl

#$Nonsym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames              ranges strand |     ensembl_ids   gene_biotype

#         <Rle>           <IRanges>  <Rle> |     <character>    <character>

#  [1]     chr2 148602722-148602723      + | ENSG00000121989 protein_coding

#  [2]     chr2 148653870-148653871      + | ENSG00000121989 protein_coding

#  [3]     chr2 148657027-148657028      + | ENSG00000121989 protein_coding

#  [4]     chr2 148657313-148657314      + | ENSG00000121989 protein_coding

#  [5]     chr2 148672760-148672761      + | ENSG00000121989 protein_coding

#  [6]     chr2 148674852-148674853      + | ENSG00000121989 protein_coding

#      cds_length gene_start_position gene_end_position prev_exons_length

#       <numeric>           <integer>         <integer>         <numeric>

#  [1]       1542           148602086         148688393                 0

#  [2]       1542           148602086         148688393                55

#  [3]       1542           148602086         148688393               263

#  [4]       1542           148602086         148688393               373

#  [5]       1542           148602086         148688393               528

#  [6]       1542           148602086         148688393               672

#      coding_pos

#       <numeric>

#  [1]          1

#  [2]          2

#  [3]          3

#  [4]          2

#  [5]          1

#  [6]          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

#

#$Sym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames    ranges strand |     ensembl_ids   gene_biotype cds_length

#         <Rle> <IRanges>  <Rle> |     <character>    <character>  <numeric>

#  [1]     chr2 148602724      + | ENSG00000121989 protein_coding       1542

#  [2]     chr2 148653872      + | ENSG00000121989 protein_coding       1542

#  [3]     chr2 148657029      + | ENSG00000121989 protein_coding       1542

#  [4]     chr2 148657315      + | ENSG00000121989 protein_coding       1542

#  [5]     chr2 148672762      + | ENSG00000121989 protein_coding       1542

#  [6]     chr2 148674854      + | ENSG00000121989 protein_coding       1542

#      gene_start_position gene_end_position prev_exons_length coding_pos

#                <integer>         <integer>         <numeric>  <numeric>

#  [1]           148602086         148688393                 0          1

#  [2]           148602086         148688393                55          2

#  [3]           148602086         148688393               263          3

#  [4]           148602086         148688393               373          2

#  [5]           148602086         148688393               528          1

#  [6]           148602086         148688393               672          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

add a comment |

How about the following:

grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {

    ranges(grTargetGene) <- IRanges(

        start = start(grTargetGene) + x[1] - 1,

        end = start(grTargetGene) + x[2] - 1)

    return(grTargetGene) })

grl

#$Nonsym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames              ranges strand |     ensembl_ids   gene_biotype

#         <Rle>           <IRanges>  <Rle> |     <character>    <character>

#  [1]     chr2 148602722-148602723      + | ENSG00000121989 protein_coding

#  [2]     chr2 148653870-148653871      + | ENSG00000121989 protein_coding

#  [3]     chr2 148657027-148657028      + | ENSG00000121989 protein_coding

#  [4]     chr2 148657313-148657314      + | ENSG00000121989 protein_coding

#  [5]     chr2 148672760-148672761      + | ENSG00000121989 protein_coding

#  [6]     chr2 148674852-148674853      + | ENSG00000121989 protein_coding

#      cds_length gene_start_position gene_end_position prev_exons_length

#       <numeric>           <integer>         <integer>         <numeric>

#  [1]       1542           148602086         148688393                 0

#  [2]       1542           148602086         148688393                55

#  [3]       1542           148602086         148688393               263

#  [4]       1542           148602086         148688393               373

#  [5]       1542           148602086         148688393               528

#  [6]       1542           148602086         148688393               672

#      coding_pos

#       <numeric>

#  [1]          1

#  [2]          2

#  [3]          3

#  [4]          2

#  [5]          1

#  [6]          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

#

#$Sym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames    ranges strand |     ensembl_ids   gene_biotype cds_length

#         <Rle> <IRanges>  <Rle> |     <character>    <character>  <numeric>

#  [1]     chr2 148602724      + | ENSG00000121989 protein_coding       1542

#  [2]     chr2 148653872      + | ENSG00000121989 protein_coding       1542

#  [3]     chr2 148657029      + | ENSG00000121989 protein_coding       1542

#  [4]     chr2 148657315      + | ENSG00000121989 protein_coding       1542

#  [5]     chr2 148672762      + | ENSG00000121989 protein_coding       1542

#  [6]     chr2 148674854      + | ENSG00000121989 protein_coding       1542

#      gene_start_position gene_end_position prev_exons_length coding_pos

#                <integer>         <integer>         <numeric>  <numeric>

#  [1]           148602086         148688393                 0          1

#  [2]           148602086         148688393                55          2

#  [3]           148602086         148688393               263          3

#  [4]           148602086         148688393               373          2

#  [5]           148602086         148688393               528          1

#  [6]           148602086         148688393               672          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

add a comment |

How about the following:

grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {

    ranges(grTargetGene) <- IRanges(

        start = start(grTargetGene) + x[1] - 1,

        end = start(grTargetGene) + x[2] - 1)

    return(grTargetGene) })

grl

#$Nonsym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames              ranges strand |     ensembl_ids   gene_biotype

#         <Rle>           <IRanges>  <Rle> |     <character>    <character>

#  [1]     chr2 148602722-148602723      + | ENSG00000121989 protein_coding

#  [2]     chr2 148653870-148653871      + | ENSG00000121989 protein_coding

#  [3]     chr2 148657027-148657028      + | ENSG00000121989 protein_coding

#  [4]     chr2 148657313-148657314      + | ENSG00000121989 protein_coding

#  [5]     chr2 148672760-148672761      + | ENSG00000121989 protein_coding

#  [6]     chr2 148674852-148674853      + | ENSG00000121989 protein_coding

#      cds_length gene_start_position gene_end_position prev_exons_length

#       <numeric>           <integer>         <integer>         <numeric>

#  [1]       1542           148602086         148688393                 0

#  [2]       1542           148602086         148688393                55

#  [3]       1542           148602086         148688393               263

#  [4]       1542           148602086         148688393               373

#  [5]       1542           148602086         148688393               528

#  [6]       1542           148602086         148688393               672

#      coding_pos

#       <numeric>

#  [1]          1

#  [2]          2

#  [3]          3

#  [4]          2

#  [5]          1

#  [6]          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

#

#$Sym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames    ranges strand |     ensembl_ids   gene_biotype cds_length

#         <Rle> <IRanges>  <Rle> |     <character>    <character>  <numeric>

#  [1]     chr2 148602724      + | ENSG00000121989 protein_coding       1542

#  [2]     chr2 148653872      + | ENSG00000121989 protein_coding       1542

#  [3]     chr2 148657029      + | ENSG00000121989 protein_coding       1542

#  [4]     chr2 148657315      + | ENSG00000121989 protein_coding       1542

#  [5]     chr2 148672762      + | ENSG00000121989 protein_coding       1542

#  [6]     chr2 148674854      + | ENSG00000121989 protein_coding       1542

#      gene_start_position gene_end_position prev_exons_length coding_pos

#                <integer>         <integer>         <numeric>  <numeric>

#  [1]           148602086         148688393                 0          1

#  [2]           148602086         148688393                55          2

#  [3]           148602086         148688393               263          3

#  [4]           148602086         148688393               373          2

#  [5]           148602086         148688393               528          1

#  [6]           148602086         148688393               672          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

How about the following:

grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {

    ranges(grTargetGene) <- IRanges(

        start = start(grTargetGene) + x[1] - 1,

        end = start(grTargetGene) + x[2] - 1)

    return(grTargetGene) })

grl

#$Nonsym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames              ranges strand |     ensembl_ids   gene_biotype

#         <Rle>           <IRanges>  <Rle> |     <character>    <character>

#  [1]     chr2 148602722-148602723      + | ENSG00000121989 protein_coding

#  [2]     chr2 148653870-148653871      + | ENSG00000121989 protein_coding

#  [3]     chr2 148657027-148657028      + | ENSG00000121989 protein_coding

#  [4]     chr2 148657313-148657314      + | ENSG00000121989 protein_coding

#  [5]     chr2 148672760-148672761      + | ENSG00000121989 protein_coding

#  [6]     chr2 148674852-148674853      + | ENSG00000121989 protein_coding

#      cds_length gene_start_position gene_end_position prev_exons_length

#       <numeric>           <integer>         <integer>         <numeric>

#  [1]       1542           148602086         148688393                 0

#  [2]       1542           148602086         148688393                55

#  [3]       1542           148602086         148688393               263

#  [4]       1542           148602086         148688393               373

#  [5]       1542           148602086         148688393               528

#  [6]       1542           148602086         148688393               672

#      coding_pos

#       <numeric>

#  [1]          1

#  [2]          2

#  [3]          3

#  [4]          2

#  [5]          1

#  [6]          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

#

#$Sym

#GRanges object with 6 ranges and 7 metadata columns:

#      seqnames    ranges strand |     ensembl_ids   gene_biotype cds_length

#         <Rle> <IRanges>  <Rle> |     <character>    <character>  <numeric>

#  [1]     chr2 148602724      + | ENSG00000121989 protein_coding       1542

#  [2]     chr2 148653872      + | ENSG00000121989 protein_coding       1542

#  [3]     chr2 148657029      + | ENSG00000121989 protein_coding       1542

#  [4]     chr2 148657315      + | ENSG00000121989 protein_coding       1542

#  [5]     chr2 148672762      + | ENSG00000121989 protein_coding       1542

#  [6]     chr2 148674854      + | ENSG00000121989 protein_coding       1542

#      gene_start_position gene_end_position prev_exons_length coding_pos

#                <integer>         <integer>         <numeric>  <numeric>

#  [1]           148602086         148688393                 0          1

#  [2]           148602086         148688393                55          2

#  [3]           148602086         148688393               263          3

#  [4]           148602086         148688393               373          2

#  [5]           148602086         148688393               528          1

#  [6]           148602086         148688393               672          1

#  -------

#  seqinfo: 1 sequence from an unspecified genome; no seqlengths

grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

answered Nov 23 '18 at 1:56

Maurits Evers

30.5k41637

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

add a comment |

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

– lizaveta
Nov 30 '18 at 13:48

@lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

– Maurits Evers
Dec 1 '18 at 7:53

by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

– lizaveta
Dec 7 '18 at 14:46

add a comment |

-1

I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)

CodonPosition_separation = function(grTargetGene) {

    grTargetGene = sort(grTargetGene)

    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])

    if (length(grTargetGene) >1) {

        for (l in 2:length(grTargetGene)) {

          grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]

        }

      }

  grTargetGene$coding_pos =  grTargetGene$prev_exons_length%%3+1

  grTargetGene_N =  GRanges()

  grTargetGene_S =  GRanges()

  for (l in 1:length(grTargetGene)) {

    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}

    if (as.character(strand(grTargetGene)[1]) =="+"){

      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]

      end_ns = end(grTargetGene[l])

      if (start_ns <=end_ns) {

        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

      }

      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]

      end_s = end(grTargetGene[l])

      if (start_s <=end_s) {

        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_syn = start_syn

      }



    } else {

      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]

      end_ns = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

      }

      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]

      end_s = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_syn = start_syn

      }

    }

    if (exists("start_nonsyn")) {

      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)

      gr_nonsyn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),

        strand = rep(strand(grTargetGene[l]), length_nonsyn),

        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))

      )

      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])

      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)

      } 

    if (exists("start_syn")) {

      length_syn = length(start_syn)

      gr_syn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_syn),

        strand = rep(strand(grTargetGene[l]), length_syn),

        ranges = IRanges(start = start_syn, end = end_syn)

      )

      gr_syn = intersect(gr_syn,grTargetGene[l])

      grTargetGene_S = append(grTargetGene_S, gr_syn)

    }

  }

  return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))

}

It works nicely:

> CodonPosition_separation(grTargetGene)

$grTargetGene_S

GRanges object with 514 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602724, 148602724]      +

    [2]     chr2 [148602727, 148602727]      +

    [3]     chr2 [148602730, 148602730]      +

    [4]     chr2 [148602733, 148602733]      +

    [5]     chr2 [148602736, 148602736]      +

    ...      ...                    ...    ...

  [510]     chr2 [148684831, 148684831]      +

  [511]     chr2 [148684834, 148684834]      +

  [512]     chr2 [148684837, 148684837]      +

  [513]     chr2 [148684840, 148684840]      +

  [514]     chr2 [148684843, 148684843]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths



$grTargetGene_N

GRanges object with 517 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602722, 148602723]      +

    [2]     chr2 [148602725, 148602726]      +

    [3]     chr2 [148602728, 148602729]      +

    [4]     chr2 [148602731, 148602732]      +

    [5]     chr2 [148602734, 148602735]      +

    ...      ...                    ...    ...

  [513]     chr2 [148684829, 148684830]      +

  [514]     chr2 [148684832, 148684833]      +

  [515]     chr2 [148684835, 148684836]      +

  [516]     chr2 [148684838, 148684839]      +

  [517]     chr2 [148684841, 148684842]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

answered Nov 30 '18 at 14:42

lizaveta

657

add a comment |

-1

I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)

CodonPosition_separation = function(grTargetGene) {

    grTargetGene = sort(grTargetGene)

    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])

    if (length(grTargetGene) >1) {

        for (l in 2:length(grTargetGene)) {

          grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]

        }

      }

  grTargetGene$coding_pos =  grTargetGene$prev_exons_length%%3+1

  grTargetGene_N =  GRanges()

  grTargetGene_S =  GRanges()

  for (l in 1:length(grTargetGene)) {

    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}

    if (as.character(strand(grTargetGene)[1]) =="+"){

      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]

      end_ns = end(grTargetGene[l])

      if (start_ns <=end_ns) {

        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

      }

      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]

      end_s = end(grTargetGene[l])

      if (start_s <=end_s) {

        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_syn = start_syn

      }



    } else {

      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]

      end_ns = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

      }

      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]

      end_s = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_syn = start_syn

      }

    }

    if (exists("start_nonsyn")) {

      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)

      gr_nonsyn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),

        strand = rep(strand(grTargetGene[l]), length_nonsyn),

        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))

      )

      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])

      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)

      } 

    if (exists("start_syn")) {

      length_syn = length(start_syn)

      gr_syn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_syn),

        strand = rep(strand(grTargetGene[l]), length_syn),

        ranges = IRanges(start = start_syn, end = end_syn)

      )

      gr_syn = intersect(gr_syn,grTargetGene[l])

      grTargetGene_S = append(grTargetGene_S, gr_syn)

    }

  }

  return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))

}

It works nicely:

> CodonPosition_separation(grTargetGene)

$grTargetGene_S

GRanges object with 514 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602724, 148602724]      +

    [2]     chr2 [148602727, 148602727]      +

    [3]     chr2 [148602730, 148602730]      +

    [4]     chr2 [148602733, 148602733]      +

    [5]     chr2 [148602736, 148602736]      +

    ...      ...                    ...    ...

  [510]     chr2 [148684831, 148684831]      +

  [511]     chr2 [148684834, 148684834]      +

  [512]     chr2 [148684837, 148684837]      +

  [513]     chr2 [148684840, 148684840]      +

  [514]     chr2 [148684843, 148684843]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths



$grTargetGene_N

GRanges object with 517 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602722, 148602723]      +

    [2]     chr2 [148602725, 148602726]      +

    [3]     chr2 [148602728, 148602729]      +

    [4]     chr2 [148602731, 148602732]      +

    [5]     chr2 [148602734, 148602735]      +

    ...      ...                    ...    ...

  [513]     chr2 [148684829, 148684830]      +

  [514]     chr2 [148684832, 148684833]      +

  [515]     chr2 [148684835, 148684836]      +

  [516]     chr2 [148684838, 148684839]      +

  [517]     chr2 [148684841, 148684842]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

answered Nov 30 '18 at 14:42

lizaveta

657

add a comment |

-1

I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)

CodonPosition_separation = function(grTargetGene) {

    grTargetGene = sort(grTargetGene)

    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])

    if (length(grTargetGene) >1) {

        for (l in 2:length(grTargetGene)) {

          grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]

        }

      }

  grTargetGene$coding_pos =  grTargetGene$prev_exons_length%%3+1

  grTargetGene_N =  GRanges()

  grTargetGene_S =  GRanges()

  for (l in 1:length(grTargetGene)) {

    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}

    if (as.character(strand(grTargetGene)[1]) =="+"){

      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]

      end_ns = end(grTargetGene[l])

      if (start_ns <=end_ns) {

        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

      }

      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]

      end_s = end(grTargetGene[l])

      if (start_s <=end_s) {

        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_syn = start_syn

      }



    } else {

      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]

      end_ns = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

      }

      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]

      end_s = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_syn = start_syn

      }

    }

    if (exists("start_nonsyn")) {

      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)

      gr_nonsyn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),

        strand = rep(strand(grTargetGene[l]), length_nonsyn),

        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))

      )

      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])

      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)

      } 

    if (exists("start_syn")) {

      length_syn = length(start_syn)

      gr_syn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_syn),

        strand = rep(strand(grTargetGene[l]), length_syn),

        ranges = IRanges(start = start_syn, end = end_syn)

      )

      gr_syn = intersect(gr_syn,grTargetGene[l])

      grTargetGene_S = append(grTargetGene_S, gr_syn)

    }

  }

  return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))

}

It works nicely:

> CodonPosition_separation(grTargetGene)

$grTargetGene_S

GRanges object with 514 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602724, 148602724]      +

    [2]     chr2 [148602727, 148602727]      +

    [3]     chr2 [148602730, 148602730]      +

    [4]     chr2 [148602733, 148602733]      +

    [5]     chr2 [148602736, 148602736]      +

    ...      ...                    ...    ...

  [510]     chr2 [148684831, 148684831]      +

  [511]     chr2 [148684834, 148684834]      +

  [512]     chr2 [148684837, 148684837]      +

  [513]     chr2 [148684840, 148684840]      +

  [514]     chr2 [148684843, 148684843]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths



$grTargetGene_N

GRanges object with 517 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602722, 148602723]      +

    [2]     chr2 [148602725, 148602726]      +

    [3]     chr2 [148602728, 148602729]      +

    [4]     chr2 [148602731, 148602732]      +

    [5]     chr2 [148602734, 148602735]      +

    ...      ...                    ...    ...

  [513]     chr2 [148684829, 148684830]      +

  [514]     chr2 [148684832, 148684833]      +

  [515]     chr2 [148684835, 148684836]      +

  [516]     chr2 [148684838, 148684839]      +

  [517]     chr2 [148684841, 148684842]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

answered Nov 30 '18 at 14:42

lizaveta

657

I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)

CodonPosition_separation = function(grTargetGene) {

    grTargetGene = sort(grTargetGene)

    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])

    if (length(grTargetGene) >1) {

        for (l in 2:length(grTargetGene)) {

          grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]

        }

      }

  grTargetGene$coding_pos =  grTargetGene$prev_exons_length%%3+1

  grTargetGene_N =  GRanges()

  grTargetGene_S =  GRanges()

  for (l in 1:length(grTargetGene)) {

    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}

    if (as.character(strand(grTargetGene)[1]) =="+"){

      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]

      end_ns = end(grTargetGene[l])

      if (start_ns <=end_ns) {

        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

      }

      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]

      end_s = end(grTargetGene[l])

      if (start_s <=end_s) {

        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)

        end_syn = start_syn

      }



    } else {

      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]

      end_ns = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

      }

      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]

      end_s = start(grTargetGene[l])

      if (start_ns >=end_ns) {

        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)

        end_syn = start_syn

      }

    }

    if (exists("start_nonsyn")) {

      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)

      gr_nonsyn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),

        strand = rep(strand(grTargetGene[l]), length_nonsyn),

        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))

      )

      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])

      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)

      } 

    if (exists("start_syn")) {

      length_syn = length(start_syn)

      gr_syn = GRanges(

        seqnames = rep(seqnames(grTargetGene[l]), length_syn),

        strand = rep(strand(grTargetGene[l]), length_syn),

        ranges = IRanges(start = start_syn, end = end_syn)

      )

      gr_syn = intersect(gr_syn,grTargetGene[l])

      grTargetGene_S = append(grTargetGene_S, gr_syn)

    }

  }

  return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))

}

It works nicely:

> CodonPosition_separation(grTargetGene)

$grTargetGene_S

GRanges object with 514 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602724, 148602724]      +

    [2]     chr2 [148602727, 148602727]      +

    [3]     chr2 [148602730, 148602730]      +

    [4]     chr2 [148602733, 148602733]      +

    [5]     chr2 [148602736, 148602736]      +

    ...      ...                    ...    ...

  [510]     chr2 [148684831, 148684831]      +

  [511]     chr2 [148684834, 148684834]      +

  [512]     chr2 [148684837, 148684837]      +

  [513]     chr2 [148684840, 148684840]      +

  [514]     chr2 [148684843, 148684843]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths



$grTargetGene_N

GRanges object with 517 ranges and 0 metadata columns:

        seqnames                 ranges strand

           <Rle>              <IRanges>  <Rle>

    [1]     chr2 [148602722, 148602723]      +

    [2]     chr2 [148602725, 148602726]      +

    [3]     chr2 [148602728, 148602729]      +

    [4]     chr2 [148602731, 148602732]      +

    [5]     chr2 [148602734, 148602735]      +

    ...      ...                    ...    ...

  [513]     chr2 [148684829, 148684830]      +

  [514]     chr2 [148684832, 148684833]      +

  [515]     chr2 [148684835, 148684836]      +

  [516]     chr2 [148684838, 148684839]      +

  [517]     chr2 [148684841, 148684842]      +

  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

answered Nov 30 '18 at 14:42

lizaveta

657

answered Nov 30 '18 at 14:42

lizaveta

657

answered Nov 30 '18 at 14:42

lizaveta

657

answered Nov 30 '18 at 14:42

lizaveta

657

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky